COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models

Main Article Content

Rubul Kumar Bania

Abstract

The fact of appearing of the handheld devices offers forthright entree to the internet and social networking sites. Sentiment analysis and opinion mining is the study of sentiments or opinions shared by different users in social networking sites like, Twitter, Facebook, Reddit, Instagram etc., on diverse social phenomena. In this article, sentiment analysis of different tweets on the ongoing epidemic COVID-19, Corona virus disease is performed. COVID-19 is declared as epidemic by the World Health Organization (WHO) in the mid of March 2020. The statistical and machine learning based analyses are implemented on 40,000 tweets, which were collected in two different mutually exclusive time frames. Tweets are collected from Twitter site between 3/07/2020 to 11/07/2020 and 01/08/2020 to 06/08/2020, using Tweepy python library. Various Python based libraries are applied for data acquisition, data pre-processing and data analysis processes.  As a data pre-processing phase initially sentences are cleaned. Then by calculating the polarity and subjectivity measures tweets are categorized into three groups (viz., negative, neutral, and positive}). Thereafter, in the later phase by applying the Term frequency-inverse document frequency (TF-IDF) feature extraction scheme with the help of uni-gram, bi-gram, and tri-gram techniques different features are extracted to prepare the datasets to feed it into the prediction models. 70% of the datasets are used to train Gaussian Naïve Bayes (G-NB), Bernoulli's Naïve Bayes (B-NB), Random forest (RF), and Support vector machine (SVM) classifiers to generate different prediction models. Finally, 30% of the data is tested on those learning models. Experimental results suggest that RF and B-NB models performed better than the other two classifier models. The execution computational cost of SVM is very high.

Article Details

How to Cite
Bania, R. K. (2020). COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models. INFOCOMP Journal of Computer Science, 19(2), 23–41. Retrieved from https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/985
Section
Machine Learning and Computational Intelligence