Skip to content

Latest commit

 

History

History
29 lines (17 loc) · 1.17 KB

README.md

File metadata and controls

29 lines (17 loc) · 1.17 KB

Sentiment Anaysis with R

Link to Etherpad

https://etherpad.wikimedia.org/p/sentiment_analysis_with_R

About Dataset

1. Amazon Book Reviews

A CSV file containing 5000 book reviews web-scrapped from Amazon in 2018

2. State of the Union Corpus (1790 - 2018)

Context:

The State of the Union is an annual address by the President of the United States before a joint session of congress. In it, the President reviews the previous year and lays out his legislative agenda for the coming year.

Content:

This dataset contains the full text of the State of the Union address from 1989 (Regan) to 2017 (Trump).

Inspiration:

This is a nice, clean set of texts perfect for exploring Natural Language Processing techniques

  1. Topic modelling: Which topics have become more popular over time? Which have become less popular?
  2. Sentiment analysis: Are there differences in tone between different Presidents? Presidents from different parties?
  3. Parsing: Can you train implement a parser to automatically extract the syntactic relationships between words?
  4. Authorship identification: Can you correctly identify the author of a previously unseen address?