Bhaavana: A Novel and Comprehensive Hindi Poetry Classifier Based on Emotions

Main Article Content

Kaushika Pal
Jatinderkumar R. Saini

Abstract

Emotions are the essence of humanity and they lead to various sensations in human beings. In traditional


Indian literature, these complex emotions are represented through the notion of ‘Rasa’ (‘रस’, meaning emotion). For


the current research, five such ‘Rasa’ namely ‘Hasya’ (‘हास्य’, comic), ‘Karuna’ (‘करुणा’, compassion), ‘Shanta’


(‘शांत’, calmness), ‘Shringar’ (‘श्रंगृ ार’, romance) and ‘Veera’ (‘साहस’, courage) have been used to design a classifier


called ‘Bhaavana’ (‘भावना’, emotion) for Hindi poetry. Technically, this is a Natural Language Processing (NLP)


quinary (i.e. five-category) classification task and we make use of various sub-tasks including Pre-processing,


Tokenization, Stemming, Bag-of-Words (BOW), Feature Extraction, and Part-Of-Speech (POS) tagging. Three types


of linguistic features namely Lexical features (LEX), Syntactic features comprising Part-of-Speech (POS) (i.e.,


LEX+POS), and Emotion specific Features (ESF) have been deployed towards the aim of designing an automatic


Hindi Poetry Classifier. A corpus of more than 800 poems with these 5 emotions and comprising more than 1,000,00


words have been processed to obtain a lexical feature set comprising more than 73,000 unique unigrams.


Additionally, Highest Rank features (HRF) have been found and experimented with LEX, LEX+POS, and ESF. The


various Machine Learning (ML) algorithms used are Gaussian Naïve Bayes (GNB), Multinomial Naïve Bayes


(MNB), Neural Network (NN), and Support Vector Machine (SVM) and experimentation results with LEX,


LEX+HRF, LEX+POS and LEX+POS+HRF, ESF+HRF for each ML algorithm are presented. These results are still


further fortified by the use of Frequency Distribution (FD), Term Frequency (TF), and Term Frequency-Inverse


Document Frequency (TF-IDF) during the experimentation. It is concluded that LEX+HRF is the best feature, FD is


the best weighing method and MNB is the best algorithm. These are respectively followed by ESF+HRF and


LEX+POS+HRF. The average of k-fold cross-validation results gives the best performance to be 71.09%. K-fold


cross-validation experiments show that ESF+HRF is a more stable feature set giving stable results across various


folds.

Article Details

How to Cite
Pal, K., & R. Saini, J. (2024). Bhaavana: A Novel and Comprehensive Hindi Poetry Classifier Based on Emotions. INFOCOMP Journal of Computer Science, 23(1). Retrieved from https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/3057
Section
Machine Learning and Computational Intelligence

References

Rakhsit G., Ghosh A., Bhattacharyya P., Haffari G.,

“Automated Analysis of Bangla Poetry for Classification

and Poet Identification”, in proceedings of 12th

International Conference on Natural Language Processing,

Trivandrum, India, 2015, pp. 247–253. Online:

https://www.cse.iitb.ac.in/~pb/papers/icon15-poet-identific

ation.pdf

Alsharif O., Alshamaa D., Ghneim N., “Emotion

Classification in Arabic Poetry using Machine Learning”,

International. Journal of Computer Application, vol.

(16), 2013, pp. 10-15. doi: 10.5120/11006-6300

Noah S.A., Jamal N., Mohd M., “Poetry classification

using support vector machines”, Journal of Computer

Science, vol. 8(6), 2012, pp. 1441–1446. doi:

3844/jcssp.2012.1441.1446

Kumar V., Minz S., “Poem Classification Using Machine

Learning Approach”, Advances in Intelligent Systems and

Computing, vol. 236, 2012, pp. 675-682. doi:

1007/978-81-322-1602-5_72

Hamidi S., Razzazi F., Ghaemmaghami M.P., “Automatic

Meter Classification in Persian Poetries using Support

Vector Machines”, in proceedings of IEEE International

Symposium on Signal Processing and Information

Technology (ISSPIT-2009), 2009, pp. 563-567. doi:

1109/ISSPIT.2009.5407514

Anne C., Mishra A., Hoque M.T., Tu S., “Multiclass Patent

Document Classification”, Artificial Intelligence Research,

vol. 7(1), 2017, pp. 1-14. doi: 10.5430/air.v7n1p1

Rennie J.D.M., Rifkin R., “Improving Multiclass Text

Classification with the Support Vector Machine”, in AI

Memos of Massachuseets Institute of Technology, 2001.

Online: https://dspace.mit.edu/handle/1721.1/7241

Chang C.C., Lin C.J., “LIBSVM: A Library for Support

Vector Machines”, ACM Transactions on Intelligent

Systems and Technology, vol. 2(3), 2011, pp. 1-27. doi:

1145/1961189.1961199

Gaur A., Yadav S., “Handwritten Hindi Character

Recognition using K- Means Clustering and SVM”, in

proceedings of 4th International Symposium on Emerging

Trends and Technology in Libraries and Information

Services, 2015. doi:10.1109/ettlis.2015.7048173

Puri S., Singh S.P., “Hindi Text Document Classification

System Using SVM and Fuzzy: A Survey”, International

Journal of Rough Sets and Data Analysis, vol. 5(4), pp.

-31, 2018. doi: 10.4018/ijrsda.2018100101

Puri S., Singh S.P., “An Efficient Hindi Text Classification

Model Using SVM”, Computing and Network

Sustainability, vol. 75, 2019. doi:

1007/978-981-13-7150-9_24

Kaur J., Saini J.R., “Punjabi Poetry Classification: The Test

of 10 Machine Learning Algorithms”, in proceedings of

ACM International Conference on Machine Learning and

Computing (ICMLC-2017), Singapore, 2017, pp. 1-5.

doi:10.1145/3055635.3056589

Kaur J., Saini J.R., “PuPoCl: Development of Punjabi

Poetry Classifier Using Linguistic Features and

Weighting”, INFOCOMP Journal of Computer Science,

Vol. 16(1-2), 2017, pp. 1-7. Online:

http://infocomp.dcc.ufla.br/index.php/infocomp/article/vie

w/546

Hindi Poetry Collection. Online: https://maatribhasha.com/

Trigrams’n’Tags tagger. Online:

https://www.nltk.org/_modules/nltk/tag/tnt.html

Omar A., “On the Digital Applications in the Thematic

Literature Studies of Emily Dickinson’s Poetry”,

International Journal of Advanced Computer Science and

Applications, vol. 11(6), 2020, pp. 361-365.

doi:10.14569/ijacsa.2020.0110647

Pal K., Patel B.V., “Automatic Categorized Corpus

Creation of Hindi Poetries Based on ‘Rasa(s)’ for

Linguistics Research”, Smart Innovation, Systems and

Technologies, vol. 235, 2021, pp. 549-556.

doi:10.1007/978-981-16-2877-1_50

Kernot D., Bossomaier T., Bradbury R., “Stylometric

Techniques for Multiple Author Clustering”, International

Journal of Advanced Computer Science and Applications,

vol. 8(3), 2017, pp. 1-8. doi:10.14569/ijacsa.2017.080301

Tarnate K.J.M., Garcia M.M., Sotelo-Bator P., “Short Poem

Generation (SPG): A Performance Evaluation of Hidden

Markov Model based on Readability Index and Turing

Test”, International Journal of Advanced Computer Science

and Applications, vol. 11(2), 2020, pp. 294-297.

doi:10.14569/ijacsa.2020.0110238

Bafna P.B., Saini J.R., “On Exhaustive Evaluation of Eager

Machine Learning Algorithms for Classification of Hindi

Verses”, International Journal of Advanced Computer

Science and Applications, vol. 11(2), 2020, pp. 181-185.

doi:10.14569/ijacsa.2020.0110224

Lou A., Inkpen D., Tan C., “Multicategory Subject-Based

Classification of Poetry”, in proceedings of the 28th

International Florida Artificial Intelligence Research

Society Conference, 2015, pp. 187-192. Online:

https://www.site.uottawa.ca/~diana/publications/flairs_201

_paper.pdf

Barros L., Rodriguez P., Ortigosa A., “Automatic

Classification of Literature Pieces by Emotion Detection: A

Study on Quevedo’s Poetry”, in proceedings of Humaine

Association Conference on Affective Computing and

Intelligent Interaction (ACII), 2013, pp. 141-146.

doi:10.1109/ACII.2013.30

Can E.F., Can F., Duygulu P., Kalpakli M., “Automatic

Categorization of Ottoman Literary Texts by Poet and Time

Period”, Computer and Information Sciences II, 2011, pp.

-57. doi:10.1007/978-1-4471-2155-8_6

B. Mehta, B. Rajyagor., “Gujarati Poetry Classification

Based On Emotions Using Deep Learning”, International

journal of Engineering Applied Sciences and Technology,

Vol. 6, Issue 1, pp. 358-362

C. Tanasescu, B. Paget, D. Inkpen., “Automatic

Classification of Poetry by Meter and Rhyme”, Association

for the Advancement of Artificial Intelligence, 2016

R. A. Deshmukh, S. Kore, N. Chavan, S. Gole, K. Adarsh.,

“Marathi Poem Classification using Machine Learning”,

Blue Eyes Intelligence Engineering & Sciences

Publication, 2019. Vol. 8, Issue 2, pp. 2723-2727. DOI:

35940/ijrte.B1761.078219

S. Ahmad , M. Zubair, F. Mazaed, S. Khan., “Classification

of Poetry Text Into the Emotional States Using Deep

Learning Technique”, IEEE Access, DOI:

1109/ACCESS.2020.2987842, 2020, Vol. 8, pp.

-73878.

A. Lou, D. Inkpen and C. asescu., “Multilabel

Subject-based Classification of Poetry”, Association for the

Advancement of Artificial Intelligence, 2015.

T. Peri-Polonijo, “The Levels Of Classification Oral

Lyrical Poems Classification”, Nar. umjet. 32/1, 1995,

pp.55-67.

V. Kesarwani, “Automatic Poetry Classification Using

Natural Language Processing”, Thesis, School of Electrical

Engineering and Computer Science Faculty of Engineering,

University of Ottawa, Canada, 2018.