Welcome to Phase 2 - Data Science of the MSA program!
We will be focusing on sentiment analysis for Phase 2, and this will involve training your own classifier as well as using prebuilt sentiment models.
- Data exploration and preparation using the NLTK package
- Sentiment analysis using the TextBlob and Vader libraries (part of NLTK)
- Sentiment analysis using Recurrent Neural Network
For windows
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt
jupyter notebook
For Mac
python3 -m venv venv
source ./venv/bin/activate
pip install -r requirements.txt
jupyter notebook
Feel free to contact Karim if you run into problems with the code for Mac! :)
If you are new to using a virtual environment for Python (venv) read more about it here.
These are OPTIONAL resources which will help you understand the content better
- Datacamp tutorial
https://www.datacamp.com/community/tutorials/text-analytics-beginners-nltk?utm_source=adwords_ppc&utm_campaignid=898687156&utm_adgroupid=48947256715&utm_device=c&utm_keyword=&utm_matchtype=b&utm_network=g&utm_adpostion=&utm_creative=332602034352&utm_targetid=dsa-429603003980&utm_loc_interest_ms=&utm_loc_physical_ms=1011036&gclid=EAIaIQobChMI-_uokLe96wIViA4rCh31dwslEAAYASAAEgK0DPD_BwE - Tensorflow tutorial
https://www.tensorflow.org/tutorials/text/text_classification_rnn
Submissions close on 8AM 18th September 2020.
- Compete in the Kaggle challenge here: https://www.kaggle.com/t/eade3863494042b8b7e051aaa9efabd3
You will have to build your own model for this challenge - Develop a business case and extract data using either the Reddit API or webscraping techniques to solve it. You will need to perform general cleaning, exploration and sentiment analysis in your attempt to solve it.
Post your question on our facebook group or on our discord server
We welcome all students to help us improve documentation for other students. If you find a typo or find something is unclear, please open a pull request or an issue and assign it to LindaBot
😀