Skip to content

Latest commit

 

History

History
35 lines (22 loc) · 1.53 KB

README.md

File metadata and controls

35 lines (22 loc) · 1.53 KB

Twitter-Sentiment-Analysis

Analysis of negative and positive tweets by twitter users based on classifying them using vaious ML models. Goes through various steps in NLP such as stopwords dealing, stemming, lemmatization,etc.

About this Dataset

Link: https://www.kaggle.com/datasets/kazanova/sentiment140

Context

This is the sentiment140 dataset. It contains 1,600,000 tweets extracted using the twitter api . The tweets have been annotated (0 = negative, 4 = positive) and they can be used to detect sentiment .

Content

It contains the following 6 fields:

target: the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive) ids: The id of the tweet ( 2087) date: the date of the tweet (Sat May 16 23:58:44 UTC 2009) flag: The query (lyx). If there is no query, then this value is NO_QUERY. user: the user that tweeted (robotickilldozr) text: the text of the tweet (Lyx is cool)

According to the creators of the dataset:

"Our approach was unique because our training data was automatically created, as opposed to having humans manual annotate tweets. In our approach, we assume that any tweet with positive emoticons, like :), were positive, and tweets with negative emoticons, like :(, were negative. We used the Twitter Search API to collect these tweets by using keyword search"

citation: Go, A., Bhayani, R. and Huang, L., 2009. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(2009), p.12.

Download from nltk

nltk.download('stopwords')

nltk.download('wordnet')

nltk.download('omw-1.4')