Subspace Partitioning through Data Decomposition for Outlier Detection

Main Article Content

Gouranga Duari
Prof. Rajeev Kumar

Abstract

Decomposition for complexity minimization has long been a challenging approach. This paper presents a data decomposition approach as a pre-processor for outlier detection. The decomposition of the data using subspace partitioning makes homogeneous sub-groups. Consequently, it reduces the complexity of data patterns by isolating possible outliers into the sub-groups of monolithic character. This approach creates sub-groups of homogeneous data points based on the fitness of purpose. They optimize the outlier patterns in the sub-groups for subsequent mapping of outlier detectors onto the sub-groups. This decomposition strategy is found to be effective in reducing the complexity of learning for the detectors without deterioration in the overall detection rate. We experimented with this approach using different benchmark detectors on eight benchmark data sets. Our data decomposition approach is superior for identifying localized patterns in the partitions and offers a better generalization.

Article Details

How to Cite
Duari, G., & Kumar, R. (2024). Subspace Partitioning through Data Decomposition for Outlier Detection. INFOCOMP Journal of Computer Science, 23(1). Retrieved from https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/3206
Section
Machine Learning and Computational Intelligence
Author Biography

Prof. Rajeev Kumar, Jawaharlal Nehru University

Rajeev Kumar is a professor of computer science at Jawaharlal Nehru University New Delhi. He holds PhD degree from Univ. of Sheffield and Master’s degree from IIT Roorkee. Earlier, he was a professor at IITs Kharagpur and Kanpur and BITS Pilani. Prior to his academic tenure, he worked as a Scientist in Dept. Science & Technology (DST) and Defense R & D Organization (DRDO) in India. He has four decades of experience in research and teaching. His research interests include machine learning, scientometrics, multimedia and software systems, and evolutionary optimization. He has published over 200 peer reviewed research articles in international journals and conferences. He authored several public policies for higher education in India.

References

Affonso, E. T., Nunes, R. D., Rosa, R. L., Pivaro, G. F., and Rodriguez, D. Z. Speech quality assessment

in wireless voip communication using deep belief network. IEEE Access, 6:77022–77032, 2018.

Affonso, E. T., Rosa, R. L., and Rodriguez, D. Z. Speech quality assessment over lossy transmission channels using deep belief networks. IEEE Signal Processing Letters, 25(1):70–74, 2017.

Aggarwal, C. C. and Sathe, S. Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explorations Newsletter, 17(1):24–47, 2015.

Angiulli, F. and Pizzuti, C. Fast outlier detection in high dimensional spaces. In Proc. European Conf. Principles of Data Mining & Knowledge Discovery, pages 15–27. Springer, 2002.

Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. Lof: identifying density-based local outliers. In Proc. ACM SIGMOD Int. Conf. Management of Data, pages 93–104, 2000.

Campos, G. O., Zimek, A., Sander, J., Campello, R. J., Micenková, B., Schubert, E., Assent, I., and Houle, M. E. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining & Knowledge Discovery, 30(4):891–927, 2016.

Cheng, Z., Zou, C., and Dong, J. Outlier detection using isolation forest and local outlier factor. In

Proc. Conf. Research in Adaptive and Convergent Systems, pages 161–168, 2019.

Dang, T. T., Ngan, H. Y., and Liu, W. Distancebased kNN outlier detection method in large-scale traffic data. In Proc. IEEE Int. Conf. Digital Signal Processing, pages 507–510, 2015.

Dantas Nunes, R., Lopes Rosa, R., and Zegarra Rodríguez, D. Performance improvement of a non-intrusive voice quality metric in lossy networks. IET Communications, 13(20):3401–3408, 2019.

de Almeida, F. L., Rosa, R. L., and Rodriguez, D. Z. Voice quality assessment in communication services using deep learning. In 2018 15th International Symposium on Wireless Communication Systems (ISWCS), pages 1–6. IEEE, 2018.

Duari, G. and Kumar, R. Clustering for global and local outliers. In Proc. 4th Int. Conf. Machine Intelligence Techniques for Data Analysis & Signal Processing (MISP 2022), Volume 1, pages 601– 610. Springer, 2023.

Fernando, T., Gammulle, H., Denman, S., Sridharan, S., and Fookes, C. Deep learning for medical

anomaly detection–a survey. ACM Computing Surveys (CSUR), 54(7):1–37, 2021.

Fukunaga, K. Introduction to statistical pattern recognition, chapter 10. Academic Press, 2:446–451, 1990.

Hassaan, M., Maher, H., and Gouda, K. A fast and efficient algorithm for outlier detection over data streams. Int. Journal Advanced Computer Science & Applications, 12(11), 2021.

He, Z., Xu, X., and Deng, S. Discovering clusterbased local outliers. Pattern Recognition Letters,24(9-10):1641–1650, 2003.

Jain, A. K. Data clustering: 50 years beyond Kmeans. Pattern Recognition Letters, 31(8):651–666, 2010.

Jiang, M.-F., Tseng, S.-S., and Su, C.-M. Twophase clustering process for outliers detection. Pattern Recognition Letters, 22(6-7):691 700, 2001.

Khan, W. and Haroon, M. An efficient framework for anomaly detection in attributed social networks. Int. Journal Information Technology, 14(6):3069–3076, 2022.

Knorr, E. M. and Ng, R. T. A unified approach for mining outliers. In Proc. Conf. Centre for Advanced Studies on Collaborative Research, page 11, 1997. INFOCOMP, v. 23, no. 1, p. pp-pp, June, 2024. Duari et al. Decomposition for Outlier Detection Using Space Partitioning 12

Kumar, R. and Rockett, P. Multiobjective genetic algorithm partitioning for hierarchical learning of high-dimensional pattern spaces: a learningfollows- decomposition strategy. IEEE Trans. Neural Networks, 9(5):822–830, 1998.

Li, Z., Zhao, Y., Botta, N., Ionescu, C., and Hu, X. COPOD: copula-based outlier detection. In Proc. IEEE Int. Conf. Data Mining (ICDM), pages 1118–1123. IEEE, 2020.

Li, Z., Zhao, Y., Hu, X., Botta, N., Ionescu, C., and Chen, G. Ecod: Unsupervised outlier detection using empirical cumulative distribution functions. IEEE Trans. Knowledge and Data Engineering, 2022.

Liu, F. T., Ting, K. M., and Zhou, Z.-H. Isolation forest. In Proc. 8th IEEE Int. Conf. Data Mining, pages 413–422, 2008.

Liu, H., Li, J., Wu, Y., and Fu, Y. Clustering with outlier removal. IEEE Trans. Knowledge & Data Engineering (TKDE), 33(6):2369–2379, 2019.

Liu, W. and Pyrcz, M. J. A spatial correlationbased anomaly detection method for subsurface modeling. Mathematical Geosciences, 53:809–822, 2021.

Maimon, O. and Rokach, L. Decomposition methodology for knowledge discovery and data mining. In Data Mining & Knowledge Discovery Handbook, pages 981–1003. Springer, 2005.

Mukhriya, A. and Kumar, R. Building outlier detection ensembles by selective parameterization of heterogeneous methods. Pattern Recognition Letters, 146:126–133, 2021.

Paulheim, H. and Meusel, R. A decomposition of the outlier detection problem into a set of supervised learning problems. Machine Learning, 100:509–531, 2015.

Rajalakshmi, S. and Madhubala, P. Centroid stabilized fuzzy tukey quartile and z curve neural network based outlier detection. INFOCOMP Journal Computer Science, 21(2), 2022.

Ramaswamy, S., Rastogi, R., and Shim, K. Efficient algorithms for mining outliers from large data sets. In Proc. ACM SIGMOD Int. Conf. Management of Data, pages 427–438, 2000.

Rayana, S. and Akoglu, L. Less is more: Building selective anomaly ensembles. ACM Trans. Knowledge Discovery from Data, 10(4):1–33, 2016.

Rios, R. A. and de Mello, R. F. A systematic literature review on decomposition approaches to estimate

time series components. INFOCOMP Journal Computer Science, 11(3-4):31–46, 2012.

Rodriguez, D. Z. and Bressan, G. Video quality assessments on digital tv and video streaming services using objective metrics. IEEE Latin America Transactions, 10(1):1184–1189, 2012.

Rodriguez, D. Z. and Junior, L. C. B. Determining a non-intrusive voice quality model using machine

learning and signal analysis in time. INFOCOMP Journal of Computer Science, 18(2), 2019.

Rodríguez, D. Z., Rosa, R. L., Almeida, F. L., Mittag, G., and Möller, S. Speech quality assessment

in wireless communications with mimo systems using a parametric model. IEEE Access, 7:35719–

, 2019.

Sathe, S. and Aggarwal, C. LODES: Local density meets spectral outlier detection. In Proc. SIM Int. Conf. Data Mining, pages 171–179. SIAM, 2016.

Shyu, M.-L., Chen, S.-C., Sarinnapakorn, K., and Chang, L. A novel anomaly detection scheme based on principal component classifier. Technical report, Miami Univ Coral Gables Fl Dept of Electrical and Computer Engineering, 2003.

Sun, L., He, M., Wang, N., and Wang, H. Improving autoencoder by mutual information maximization and shuffle attention for novelty detection. Applied Intelligence, pages 1–15, 2023.

Tang, J., Chen, Z., Fu, A. W.-C., and Cheung, D. W. Enhancing effectiveness of outlier detections for low-density patterns. In Proc. Pacific-Asia Conf. Knowledge Discovery & Data Mining, pages 535–548. Springer, 2002.

Vassilvitskii, S. and Arthur, D. k-means++: The advantages of careful seeding. In Proc. 18th Annual ACM-SIAM Symp. Discrete Algorithms, pages 1027–1035, 2006.

Wang, C., Liu, Z., Gao, H., and Fu, Y. VOS: A new outlier detection model using virtual graph.

Knowledge-Based Systems, 185:104907, 2019. Duari et al. Decomposition for Outlier Detection Using Space Partitioning 13

Wang, H., Bah, M. J., and Hammad, M. Progress in outlier detection tech: A survey. IEEE Access,7:107964–108000, 2019.

Yuan, Z., Chen, H., Li, T., Liu, J., and Wang, S. Fuzzy information entropy-based adaptive approach

for hybrid feature outlier detection. Fuzzy Sets and Systems, 421:1–28, 2021.