Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets

Rubul Kumar Bania

pdf

Published: Dec 1, 2021

Rubul Kumar Bania

North-Eastern Hill University

Abstract

During catastrophe, detecting tweets associated to the target incident is an exigent task. Sentiment analysis is one kind of the study of sentiments shared by diverse users in social networking sites like, Twitter, Facebook, etc., on various social phenomena. In this article, analysis of sentiments on thousands of tweets collected for the period of July to August 2020 and May 2021 to June 2021 on the ongoing pandemic of COVID-19 is carried out. By adopting the majority voting idea one novel ensemble learning model is proposed to classify the tweets into \textit{negative}, \textit{neutral}, and \textit{positive} groups. Data preprocessing, polarity and other various analysis techniques are applied on the COVID-19 related tweets. By applying TF-IDF with uni-gram and bi-gram techniques text features are extracted and five machine learning models such as Na\"ive Bayes (NB), logistic regression (LR), $K$ nearest neighbour ($K$NN), decision tree (DT) and random forest (RF) are judiciously combined to build an ensemble model. Experimental results suggest that on both the feature extraction model i.e., on unigram and bigram feature extraction techniques, proposed model has performed better than the other compared models. With 70\%--30\% train-test set, proposed model is able to has achieved an accuracy of 94.67\% to classify the tweets into various classes.

How to Cite

Bania, R. K. (2021). Heterogenous Ensemble Learning Framework for Sentiment Analysis on COVID-19 Tweets. INFOCOMP Journal of Computer Science, 20(2). Retrieved from https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1763

Issue

Vol. 20 No. 2 (2021): December 2021

Section

Articles

Upon receipt of accepted manuscripts, authors will be invited to complete a copyright license to publish the paper. At least the corresponding author must send the copyright form signed for publication. It is a condition of publication that authors grant an exclusive licence to the the INFOCOMP Journal of Computer Science. This ensures that requests from third parties to reproduce articles are handled efficiently and consistently and will also allow the article to be as widely disseminated as possible. In assigning the copyright license, authors may use their own material in other publications and ensure that the INFOCOMP Journal of Computer Science is acknowledged as the original publication place.

Article Sidebar

Main Article Content

Abstract

Article Details