COVID-19 Public Tweets Sentiment Analysis using TF-IDF and Inductive Learning Models
Main Article Content
Abstract
The fact of appearing of the handheld devices offers forthright entree to the internet and social networking sites. Sentiment analysis and opinion mining is the study of sentiments or opinions shared by different users in social networking sites like, Twitter, Facebook, Reddit, Instagram etc., on diverse social phenomena. In this article, sentiment analysis of different tweets on the ongoing epidemic COVID-19, Corona virus disease is performed. COVID-19 is declared as epidemic by the World Health Organization (WHO) in the mid of March 2020. The statistical and machine learning based analyses are implemented on 40,000 tweets, which were collected in two different mutually exclusive time frames. Tweets are collected from Twitter site between 3/07/2020 to 11/07/2020 and 01/08/2020 to 06/08/2020, using Tweepy python library. Various Python based libraries are applied for data acquisition, data pre-processing and data analysis processes. As a data pre-processing phase initially sentences are cleaned. Then by calculating the polarity and subjectivity measures tweets are categorized into three groups (viz., negative, neutral, and positive}). Thereafter, in the later phase by applying the Term frequency-inverse document frequency (TF-IDF) feature extraction scheme with the help of uni-gram, bi-gram, and tri-gram techniques different features are extracted to prepare the datasets to feed it into the prediction models. 70% of the datasets are used to train Gaussian Naïve Bayes (G-NB), Bernoulli's Naïve Bayes (B-NB), Random forest (RF), and Support vector machine (SVM) classifiers to generate different prediction models. Finally, 30% of the data is tested on those learning models. Experimental results suggest that RF and B-NB models performed better than the other two classifier models. The execution computational cost of SVM is very high.
Article Details
Upon receipt of accepted manuscripts, authors will be invited to complete a copyright license to publish the paper. At least the corresponding author must send the copyright form signed for publication. It is a condition of publication that authors grant an exclusive licence to the the INFOCOMP Journal of Computer Science. This ensures that requests from third parties to reproduce articles are handled efficiently and consistently and will also allow the article to be as widely disseminated as possible. In assigning the copyright license, authors may use their own material in other publications and ensure that the INFOCOMP Journal of Computer Science is acknowledged as the original publication place.