Evaluation and Comparison of Concept Based and N-Grams Based Text Clustering Using SOM

Main Article Content

Abdelmalek Amine
Zakaria Elberrichi
Michel Simonet
Mimoun Malki

Abstract

With the great and rapidly growing number of documents available in digital form (Internet, library, CD-Rom…), the automatic classification of texts has become a significant research field and a fundamental task in document processing. This paper deals with unsupervised classification of textual documents also called text clustering using Self-Organizing Maps of Kohonen in two new situations: a conceptual representation of texts and a representation based on n-grams, instead of a representation based on words. The effects of these combinations are examined in several experiments using 4 measurements of similarity. The Reuters-21578 corpus is used for evaluation. The evaluation was done by using the F-measure and the entropy.

Article Details

How to Cite
Amine, A., Elberrichi, Z., Simonet, M., & Malki, M. (2008). Evaluation and Comparison of Concept Based and N-Grams Based Text Clustering Using SOM. INFOCOMP Journal of Computer Science, 7(1), 27–35. Retrieved from https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/203
Section
Articles