Unification of Numerical and Ordinal Survey Data for Clustering-based Inferencing

Main Article Content

Bhupendera Kumar
Rajeev Kumar

Abstract

With the proliferation of surveys for almost every issue governing our life with various parameters and a variety of data, it becomes necessary for a researcher to unify these data followed for extracting inferences from the survey. Data from quantitative surveys are clustered to reveal respondents' divergent and dominant tendencies. It aims to investigate the general trends among the respondents' categories. Due to the unique characteristics of survey data, popular clustering techniques based on value similarity are inadequate.
In this paper, we attempt to unify the numerical data with the ordinal data of a survey. We model the data with a Gaussian distribution, therefore, we first convert the numerical data to ordinal data following the distribution; this may be the governing attributes for deciding the clusters. Then, we use $K$-means clustering with varying numbers of clusters. We implement the proposed methodologies on real survey data and compare the clustering efficiency before and after the proposed methodology on the number of clusters. More crucially, it appropriately uses the ordinal attributes order information and numerical attribute statistical information for clustering. Extensive testing demonstrates that the suggested unification works better on real data sets than its contemporaries.

Article Details

How to Cite
Kumar, B., & Kumar, R. . (2023). Unification of Numerical and Ordinal Survey Data for Clustering-based Inferencing. INFOCOMP Journal of Computer Science, 22(1). Retrieved from https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/2492
Section
Machine Learning and Computational Intelligence

References

Amine, A., Elberrichi, Z., Simonet, M., and Malki, M. Evaluation and comparison of concept

based and n-grams based text clustering using SOM. INFOCOMP Journal of Computer Science, 7(1):27–35, 2008.

Biernacki, C., Marbac, M., and Vandewalle, V. Gaussian-based visualization of gaussian and nongaussian-

based clustering. Journal of Classification, 38(1):129–157, 2021.

Carrillo, D., Nguyen, L. D., Nardelli, P. H., Pournaras, E., Morita, P., Rodríguez, D. Z., Dzaferagic, M., Siljak, H., Jung, A., Hébert-

Dufresne, L., et al. Corrigendum: Containing future epidemics with trustworthy federated systems for ubiquitous warning and response. Frontiers in Communications and Networks, 2:721971, 2021.

Cheng, Y. and Church, G. M. Biclustering of expression data. In Proc. ISMB, volume 8, pages 93–103, 2000.

Fang, Y., Karlis, D., and Subedi, S. Infinite mixtures of multivariate normal-inverse gaussian distributions

for clustering of skewed data. Journal of Classification, pages 1–43, 2022.

Ferreira, J. P. B., Junior, F. L., Rosa, R. L., and Rodríguez, D. Z. Evaluation of sentiment and affectivity analysis in a blog recommendation system. In Proceedings of the XVI Brazilian Symposium on Human Factors in Computing Systems, pages

–9, 2017.

Ghassabeh, Y. A. A sufficient condition for the convergence of the mean shift algorithm with

gaussian kernel. Journal of Multivariate Analysis, 135:1–10, 2015.

Giordan, M. and Diana, G. A clustering method for categorical ordinal data. Communications in

StatisticsâTheory & Methods, 40(7):1315–1334, 2011.

Golinko, E., Sonderman, T., and Zhu, X. CNFL: categorical to numerical feature learning for clustering

and classification. In Proc. IEEE 2nd Int. Conf. Data Science in Cyberspace, pages 585– 594. IEEE, 2017.

Harvey, L. The new collegialism: improvement with accountability. Tertiary Education & Management,

(2):153–160, 1995.

Jiang, D., Tang, C., and Zhang, A. Cluster analysis for gene expression data: a survey. IEEE Trans.

Knowledge & Data Engineering, 16(11):1370– 1386, 2004.

Jongbloed, B., Enders, J., and Salerno, C. Higher education and its communities: Interconnections,

interdependencies and a research agenda. Higher Education, 56(3):303–324, 2008.

Kinnunen, T., Sidoroff, I., Tuononen, M., and Fränti, P. Comparison of clustering methods: A

case study of text-independent speaker modeling. Pattern Recognition Letters, 32(13):1604–1617, 2011.

Kriegel, H.-P., Kröger, P., and Zimek, A. Clustering high-dimensional data: A survey on subspace

clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowledge Discovery

from Data, 3(1):1–58, 2009.

Kumar, B. and Kumar, R. Difference-attributebased clustering for ordinal survey data. In Proc.

Int. Conf. Signal Processing & Integrated Networks, pages 17–27. Springer, 2022.

Kumar, B. and Kumar, R. Entropy-based clustering for subspace pattern discovery in ordinal survey

data. In Proc. Int. Conf. Frontiers of Intelligent Computing: Theory and Applications, pages

–519. Springer, 2022.

Kumar, V., Chhabra, J. K., and Kumar, D. Performance evaluation of distance metrics in the clustering

algorithms. INFOCOMP Journal of Computer Science, 13(1):38–52, 2014.

Lyytinen, A., Kohtamäki, V., Kivistö, J., Pekkola, E., and Hölttä, S. Scenarios of quality assurance

of stakeholder relationships in finnish higher education institutions. Quality in Higher education,

(1):35–49, 2017.

Mamabolo, M. A. and Myres, K. A detailed guide on converting qualitative data into quantitative

entrepreneurial skills survey instrument. The Electronic Journal of Business Research Methods,

pages 102–117, 2019.

Okey, O. D., Melgarejo, D. C., Saadi, M., Rosa, R. L., Kleinschmidt, J. H., and Rodríguez, D. Z.

Transfer learning approach to ids on cloud iot devices using optimized cnn. IEEE Access,

:1023–1038, 2023.

PINTO, G. E., Rosa, R. L., and Rodriguez, D. Z. Applications for 5g networks. INFOCOMP Journal

of Computer Science, 20(1), 2021.

Rastogi, R., Mondal, P., Agarwal, K., Gupta, R., and Jain, S. GA based clustering of mixed data

type of attributes (numeric, categorical, ordinal, binary and ratio-scaled). BVICA M’s Int. J. Information

Technology, 7(2):861, 2015.

Rich, T. S. South korean perceptions of unification: Evidence from an experimental survey. Geo.

J. Int’l Aff., 20:142, 2019.

Rodriguez, D. Z., de Oliveira, F. M., Nunes, P. H., and de Morais, R. M. A. Wearable devices: Concepts

and applications. INFOCOMP Journal of Computer Science, 18(2), 2019.

Rodríguez, D. Z., Rosa, R. L., and Bressan, G. A proposed video complexity measurement method

to be used in cluster computing. In Proc. IEEE Global High Tech Congress Electronics, pages 76–77. IEEE, 2013.

Rosa, R. L., De Silva, M. J., Silva, D. H., Ayub, M. S., Carrillo, D., Nardelli, P. H., and Rodriguez, D. Z. Event detection system based on user behavior changes in online social networks: Case of the covid-19 pandemic. Ieee Access, 8:158806–

, 2020.

Rosa, R. L., Rodriguez, D. Z., and Bressan, G. Sentimeter-br: Facebook and twitter analysis tool

to discover consumersâ sentiment. AICT 2013, page 72, 2013.

Rosa, R. L., Schwartz, G. M., Ruggiero, W. V., and Rodríguez, D. Z. A knowledge-based recommendation

system that includes sentiment analysis and deep learning. IEEE Trans. Industrial Informatics,

(4):2124–2135, 2018.

Sadh, R. and Kumar, R. Clustering of quantitative survey data based on marking patterns. INFOCOMP

Journal of Computer Science, 19(2):109–119, 2020.

Sharma, U. and Manchanda, N. Predicting and improving entrepreneurial competency in university

students using machine learning algorithms. In Proc. 10th Int. Conf. Cloud Computing, Data

Science & Engineering (Confluence), pages 305–309. IEEE, 2020.

Silva, D. H., Rosa, R. L., and Rodriguez, D. Z.Sentimental analysis of soccer games messages

from social networks using userâs profiles. INFOCOMP Journal of Computer Science, 19(1), 2020.

Teodoro, A. A., Gomes, O. S., Saadi, M., Silva, B. A., Rosa, R. L., and Rodríguez, D. Z. An fpgabased performance evaluation of artificial neural network architecture algorithm for iot. Wireless Personal Communications, pages 1–32, 2021.

Teodoro, A. A., Silva, D. H., Rosa, R. L., Saadi, M., Wuttisittikulkij, L., Mumtaz, R. A., and Rodriguez,

D. Z. A skin cancer classification approach using gan and roi-based attention mechanism.

Journal of Signal Processing Systems, 95(2- 3):211–224, 2023.

Velleman, P. F. and Wilkinson, L. Nominal, ordinal, interval, and ratio typologies are misleading.

The American Statistician, 47(1):65–72, 1993.

Vichi, M., Cavicchia, C., and Groenen, P. J. Hierarchical means clustering. Journal of Classification,

pages 1–25, 2022.

Wang, H., Wang, W., Yang, J., and Yu, P. S. Clustering by pattern similarity in large data sets. In

Proc. ACM SIGMOD Int. Conf. Management of data, pages 394–405, 2002.

Zhang, Y. and Cheung, Y.-m. Learnable weighting of intra-attribute distances for categorical data

clustering with nominal and ordinal attributes. IEEE Trans. Pattern