Sentence-Sentiment-Analysis

DataSet

There are variuos type of getting dataset for this task. One of them is to use Twitter API https://developer.twitter.com/en to stream live data and store it. The other way is to use competition dataset. In this project, dataset used from https://www.kaggle.com/kazanova/sentiment140. Since Twitter didn't approve my developer account the first approach is TODO

Preprocessing

Download file from https://www.kaggle.com/kazanova/sentiment140, create folder datasets and put it here
Run preprocess.py on both train and test data. This will generate a preprocessed version of the dataset.
Run stats.py where is the path of csv generated from preprocess.py. This gives general statistical information about the dataset and will two pickle files which are the frequency distribution of unigrams and bigrams in the training dataset.
Run model.py

Files

dataset_manual_raw.csv - raw dataset from link https://www.kaggle.com/kazanova/sentiment140
freqdist - frequency list
freqdist-bi - frequency of bigrams
glove-seeds - Glove seeds from https://github.com/stanfordnlp/GloVe

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
const.py		const.py
model.py		model.py
preprocessing.py		preprocessing.py
statistics.py		statistics.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentence-Sentiment-Analysis

DataSet

Preprocessing

Files

About

Releases

Packages

Languages

KobanBanan/Sentence-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Sentence-Sentiment-Analysis

DataSet

Preprocessing

Files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages