Evaluation and Comparison of Concept Based and N-Grams Based Text Clustering Using SOM
Main Article Content
Abstract
With the great and rapidly growing number of documents available in digital form (Internet, library, CD-Rom…), the automatic classification of texts has become a significant research field and a fundamental task in document processing. This paper deals with unsupervised classification of textual documents also called text clustering using Self-Organizing Maps of Kohonen in two new situations: a conceptual representation of texts and a representation based on n-grams, instead of a representation based on words. The effects of these combinations are examined in several experiments using 4 measurements of similarity. The Reuters-21578 corpus is used for evaluation. The evaluation was done by using the F-measure and the entropy.
Article Details
How to Cite
Amine, A., Elberrichi, Z., Simonet, M., & Malki, M. (2008). Evaluation and Comparison of Concept Based and N-Grams Based Text Clustering Using SOM. INFOCOMP Journal of Computer Science, 7(1), 27–35. Retrieved from https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/203
Section
Articles
Upon receipt of accepted manuscripts, authors will be invited to complete a copyright license to publish the paper. At least the corresponding author must send the copyright form signed for publication. It is a condition of publication that authors grant an exclusive licence to the the INFOCOMP Journal of Computer Science. This ensures that requests from third parties to reproduce articles are handled efficiently and consistently and will also allow the article to be as widely disseminated as possible. In assigning the copyright license, authors may use their own material in other publications and ensure that the INFOCOMP Journal of Computer Science is acknowledged as the original publication place.