Bhaavana: A Novel and Comprehensive Hindi Poetry Classifier Based on Emotions
Main Article Content
Abstract
Emotions are the essence of humanity and they lead to various sensations in human beings. In traditional
Indian literature, these complex emotions are represented through the notion of ‘Rasa’ (‘रस’, meaning emotion). For
the current research, five such ‘Rasa’ namely ‘Hasya’ (‘हास्य’, comic), ‘Karuna’ (‘करुणा’, compassion), ‘Shanta’
(‘शांत’, calmness), ‘Shringar’ (‘श्रंगृ ार’, romance) and ‘Veera’ (‘साहस’, courage) have been used to design a classifier
called ‘Bhaavana’ (‘भावना’, emotion) for Hindi poetry. Technically, this is a Natural Language Processing (NLP)
quinary (i.e. five-category) classification task and we make use of various sub-tasks including Pre-processing,
Tokenization, Stemming, Bag-of-Words (BOW), Feature Extraction, and Part-Of-Speech (POS) tagging. Three types
of linguistic features namely Lexical features (LEX), Syntactic features comprising Part-of-Speech (POS) (i.e.,
LEX+POS), and Emotion specific Features (ESF) have been deployed towards the aim of designing an automatic
Hindi Poetry Classifier. A corpus of more than 800 poems with these 5 emotions and comprising more than 1,000,00
words have been processed to obtain a lexical feature set comprising more than 73,000 unique unigrams.
Additionally, Highest Rank features (HRF) have been found and experimented with LEX, LEX+POS, and ESF. The
various Machine Learning (ML) algorithms used are Gaussian Naïve Bayes (GNB), Multinomial Naïve Bayes
(MNB), Neural Network (NN), and Support Vector Machine (SVM) and experimentation results with LEX,
LEX+HRF, LEX+POS and LEX+POS+HRF, ESF+HRF for each ML algorithm are presented. These results are still
further fortified by the use of Frequency Distribution (FD), Term Frequency (TF), and Term Frequency-Inverse
Document Frequency (TF-IDF) during the experimentation. It is concluded that LEX+HRF is the best feature, FD is
the best weighing method and MNB is the best algorithm. These are respectively followed by ESF+HRF and
LEX+POS+HRF. The average of k-fold cross-validation results gives the best performance to be 71.09%. K-fold
cross-validation experiments show that ESF+HRF is a more stable feature set giving stable results across various
folds.
Article Details
Upon receipt of accepted manuscripts, authors will be invited to complete a copyright license to publish the paper. At least the corresponding author must send the copyright form signed for publication. It is a condition of publication that authors grant an exclusive licence to the the INFOCOMP Journal of Computer Science. This ensures that requests from third parties to reproduce articles are handled efficiently and consistently and will also allow the article to be as widely disseminated as possible. In assigning the copyright license, authors may use their own material in other publications and ensure that the INFOCOMP Journal of Computer Science is acknowledged as the original publication place.
References
Rakhsit G., Ghosh A., Bhattacharyya P., Haffari G.,
“Automated Analysis of Bangla Poetry for Classification
and Poet Identification”, in proceedings of 12th
International Conference on Natural Language Processing,
Trivandrum, India, 2015, pp. 247–253. Online:
https://www.cse.iitb.ac.in/~pb/papers/icon15-poet-identific
ation.pdf
Alsharif O., Alshamaa D., Ghneim N., “Emotion
Classification in Arabic Poetry using Machine Learning”,
International. Journal of Computer Application, vol.
(16), 2013, pp. 10-15. doi: 10.5120/11006-6300
Noah S.A., Jamal N., Mohd M., “Poetry classification
using support vector machines”, Journal of Computer
Science, vol. 8(6), 2012, pp. 1441–1446. doi:
3844/jcssp.2012.1441.1446
Kumar V., Minz S., “Poem Classification Using Machine
Learning Approach”, Advances in Intelligent Systems and
Computing, vol. 236, 2012, pp. 675-682. doi:
1007/978-81-322-1602-5_72
Hamidi S., Razzazi F., Ghaemmaghami M.P., “Automatic
Meter Classification in Persian Poetries using Support
Vector Machines”, in proceedings of IEEE International
Symposium on Signal Processing and Information
Technology (ISSPIT-2009), 2009, pp. 563-567. doi:
1109/ISSPIT.2009.5407514
Anne C., Mishra A., Hoque M.T., Tu S., “Multiclass Patent
Document Classification”, Artificial Intelligence Research,
vol. 7(1), 2017, pp. 1-14. doi: 10.5430/air.v7n1p1
Rennie J.D.M., Rifkin R., “Improving Multiclass Text
Classification with the Support Vector Machine”, in AI
Memos of Massachuseets Institute of Technology, 2001.
Online: https://dspace.mit.edu/handle/1721.1/7241
Chang C.C., Lin C.J., “LIBSVM: A Library for Support
Vector Machines”, ACM Transactions on Intelligent
Systems and Technology, vol. 2(3), 2011, pp. 1-27. doi:
1145/1961189.1961199
Gaur A., Yadav S., “Handwritten Hindi Character
Recognition using K- Means Clustering and SVM”, in
proceedings of 4th International Symposium on Emerging
Trends and Technology in Libraries and Information
Services, 2015. doi:10.1109/ettlis.2015.7048173
Puri S., Singh S.P., “Hindi Text Document Classification
System Using SVM and Fuzzy: A Survey”, International
Journal of Rough Sets and Data Analysis, vol. 5(4), pp.
-31, 2018. doi: 10.4018/ijrsda.2018100101
Puri S., Singh S.P., “An Efficient Hindi Text Classification
Model Using SVM”, Computing and Network
Sustainability, vol. 75, 2019. doi:
1007/978-981-13-7150-9_24
Kaur J., Saini J.R., “Punjabi Poetry Classification: The Test
of 10 Machine Learning Algorithms”, in proceedings of
ACM International Conference on Machine Learning and
Computing (ICMLC-2017), Singapore, 2017, pp. 1-5.
doi:10.1145/3055635.3056589
Kaur J., Saini J.R., “PuPoCl: Development of Punjabi
Poetry Classifier Using Linguistic Features and
Weighting”, INFOCOMP Journal of Computer Science,
Vol. 16(1-2), 2017, pp. 1-7. Online:
http://infocomp.dcc.ufla.br/index.php/infocomp/article/vie
w/546
Hindi Poetry Collection. Online: https://maatribhasha.com/
Trigrams’n’Tags tagger. Online:
https://www.nltk.org/_modules/nltk/tag/tnt.html
Omar A., “On the Digital Applications in the Thematic
Literature Studies of Emily Dickinson’s Poetry”,
International Journal of Advanced Computer Science and
Applications, vol. 11(6), 2020, pp. 361-365.
doi:10.14569/ijacsa.2020.0110647
Pal K., Patel B.V., “Automatic Categorized Corpus
Creation of Hindi Poetries Based on ‘Rasa(s)’ for
Linguistics Research”, Smart Innovation, Systems and
Technologies, vol. 235, 2021, pp. 549-556.
doi:10.1007/978-981-16-2877-1_50
Kernot D., Bossomaier T., Bradbury R., “Stylometric
Techniques for Multiple Author Clustering”, International
Journal of Advanced Computer Science and Applications,
vol. 8(3), 2017, pp. 1-8. doi:10.14569/ijacsa.2017.080301
Tarnate K.J.M., Garcia M.M., Sotelo-Bator P., “Short Poem
Generation (SPG): A Performance Evaluation of Hidden
Markov Model based on Readability Index and Turing
Test”, International Journal of Advanced Computer Science
and Applications, vol. 11(2), 2020, pp. 294-297.
doi:10.14569/ijacsa.2020.0110238
Bafna P.B., Saini J.R., “On Exhaustive Evaluation of Eager
Machine Learning Algorithms for Classification of Hindi
Verses”, International Journal of Advanced Computer
Science and Applications, vol. 11(2), 2020, pp. 181-185.
doi:10.14569/ijacsa.2020.0110224
Lou A., Inkpen D., Tan C., “Multicategory Subject-Based
Classification of Poetry”, in proceedings of the 28th
International Florida Artificial Intelligence Research
Society Conference, 2015, pp. 187-192. Online:
https://www.site.uottawa.ca/~diana/publications/flairs_201
_paper.pdf
Barros L., Rodriguez P., Ortigosa A., “Automatic
Classification of Literature Pieces by Emotion Detection: A
Study on Quevedo’s Poetry”, in proceedings of Humaine
Association Conference on Affective Computing and
Intelligent Interaction (ACII), 2013, pp. 141-146.
doi:10.1109/ACII.2013.30
Can E.F., Can F., Duygulu P., Kalpakli M., “Automatic
Categorization of Ottoman Literary Texts by Poet and Time
Period”, Computer and Information Sciences II, 2011, pp.
-57. doi:10.1007/978-1-4471-2155-8_6
B. Mehta, B. Rajyagor., “Gujarati Poetry Classification
Based On Emotions Using Deep Learning”, International
journal of Engineering Applied Sciences and Technology,
Vol. 6, Issue 1, pp. 358-362
C. Tanasescu, B. Paget, D. Inkpen., “Automatic
Classification of Poetry by Meter and Rhyme”, Association
for the Advancement of Artificial Intelligence, 2016
R. A. Deshmukh, S. Kore, N. Chavan, S. Gole, K. Adarsh.,
“Marathi Poem Classification using Machine Learning”,
Blue Eyes Intelligence Engineering & Sciences
Publication, 2019. Vol. 8, Issue 2, pp. 2723-2727. DOI:
35940/ijrte.B1761.078219
S. Ahmad , M. Zubair, F. Mazaed, S. Khan., “Classification
of Poetry Text Into the Emotional States Using Deep
Learning Technique”, IEEE Access, DOI:
1109/ACCESS.2020.2987842, 2020, Vol. 8, pp.
-73878.
A. Lou, D. Inkpen and C. asescu., “Multilabel
Subject-based Classification of Poetry”, Association for the
Advancement of Artificial Intelligence, 2015.
T. Peri-Polonijo, “The Levels Of Classification Oral
Lyrical Poems Classification”, Nar. umjet. 32/1, 1995,
pp.55-67.
V. Kesarwani, “Automatic Poetry Classification Using
Natural Language Processing”, Thesis, School of Electrical
Engineering and Computer Science Faculty of Engineering,
University of Ottawa, Canada, 2018.