Bhaavana: A Novel and Comprehensive Hindi Poetry Classifier Based on Emotions

Kaushika Pal
Jatinderkumar R. Saini


Emotions are the essence of humanity and they lead to various sensations in human beings. In traditional

Indian literature, these complex emotions are represented through the notion of ‘Rasa’ (‘रस’, meaning emotion). For

the current research, five such ‘Rasa’ namely ‘Hasya’ (‘हास्य’, comic), ‘Karuna’ (‘करुणा’, compassion), ‘Shanta’

(‘शांत’, calmness), ‘Shringar’ (‘श्रंगृ ार’, romance) and ‘Veera’ (‘साहस’, courage) have been used to design a classifier

called ‘Bhaavana’ (‘भावना’, emotion) for Hindi poetry. Technically, this is a Natural Language Processing (NLP)

quinary (i.e. five-category) classification task and we make use of various sub-tasks including Pre-processing,

Tokenization, Stemming, Bag-of-Words (BOW), Feature Extraction, and Part-Of-Speech (POS) tagging. Three types

of linguistic features namely Lexical features (LEX), Syntactic features comprising Part-of-Speech (POS) (i.e.,

LEX+POS), and Emotion specific Features (ESF) have been deployed towards the aim of designing an automatic

Hindi Poetry Classifier. A corpus of more than 800 poems with these 5 emotions and comprising more than 1,000,00

words have been processed to obtain a lexical feature set comprising more than 73,000 unique unigrams.

Additionally, Highest Rank features (HRF) have been found and experimented with LEX, LEX+POS, and ESF. The

various Machine Learning (ML) algorithms used are Gaussian Naïve Bayes (GNB), Multinomial Naïve Bayes

(MNB), Neural Network (NN), and Support Vector Machine (SVM) and experimentation results with LEX,

LEX+HRF, LEX+POS and LEX+POS+HRF, ESF+HRF for each ML algorithm are presented. These results are still

further fortified by the use of Frequency Distribution (FD), Term Frequency (TF), and Term Frequency-Inverse

Document Frequency (TF-IDF) during the experimentation. It is concluded that LEX+HRF is the best feature, FD is

the best weighing method and MNB is the best algorithm. These are respectively followed by ESF+HRF and

LEX+POS+HRF. The average of k-fold cross-validation results gives the best performance to be 71.09%. K-fold

cross-validation experiments show that ESF+HRF is a more stable feature set giving stable results across various


Pal, K., & R. Saini, J. (2024). Bhaavana: A Novel and Comprehensive Hindi Poetry Classifier Based on Emotions. INFOCOMP Journal of Computer Science, 23(1). Retrieved from
Machine Learning and Computational Intelligence


