PuPoCl: Development of Punjabi Poetry Classifier Using Linguistic Features and Weighting

PDF

Published: Dec 4, 2017

Abstract

Analysis of poetic text is very challenging from computational linguistic perspective. For library suggestion framework, poetries can be characterized on different measurements, such as writer, time period, sentiments, emotions and topic. In this paper, subject based Punjabi poetry classifier was developed using weka toolset. Four different categories were manually populated with 2034 poems (NAFE, LIPA, RORE, PHSP categories consists of 505, 399, 529 and 601 numbers of poetries, respectively. After tokenization of 2034 poetries, 45667 features were extracted and passed to noise removal sub phase. A total of 31938 features were extracted, after removal of noise, and weighted using term frequency and the entire process is repeated for tf-idf weighting scheme also . Two types of Linguistic features namely: Lexical features and syntactic features of poetries were explored to develop classifier using machine learning algorithms. Naive Bayes, Support Vector Machine, Hyper pipes and K-nearest neighbour algorithms were experimented with 31938 lexical features and 30396 syntactic features. Result shows that SVM outperformed all other classifiers using tf and tf-idf weighing schemes whereas KNN is the worst performer. With addition of POS tags with words, accuracy of SVM is increased by 1%. Result also revealed that with testing time of 0.19sec, SVM is the most efficient machine learning algorithm for Punjabi poetry classification, using tf-idf scheme.

How to Cite

PuPoCl: Development of Punjabi Poetry Classifier Using Linguistic Features and Weighting. (2017). INFOCOMP Journal of Computer Science, 16(1-2), 1–7. Retrieved from https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/546

Issue

Vol. 16 No. 1-2 (2017): June-December 2017

Section

Machine Learning and Computational Intelligence

Upon receipt of accepted manuscripts, authors will be invited to complete a copyright license to publish the paper. At least the corresponding author must send the copyright form signed for publication. It is a condition of publication that authors grant an exclusive licence to the the INFOCOMP Journal of Computer Science. This ensures that requests from third parties to reproduce articles are handled efficiently and consistently and will also allow the article to be as widely disseminated as possible. In assigning the copyright license, authors may use their own material in other publications and ensure that the INFOCOMP Journal of Computer Science is acknowledged as the original publication place.

Article Sidebar

Main Article Content

Abstract

Article Details