Skip to content

pg2455/U.S-Presidential-Speeches

Repository files navigation

DOI

U.S-Presidential-Speeches

Analyze the text of U.S. President's Speeches from 1790 to 2006 using Word2Vec(w2v) model in gensim.
Note: I worked on this project in July, 2014. There has been a lot of progress since then.

  • All speeches in their original format are in speech.txt

  • The processed version of the speeches such that each line contains one processed speech is in all_speech.txt

  • The json file containing full metadata from speech.txt in the form of list of dictionaries is in data_processed.txt. It contains following key-value pairs:

    • 'who': President's name
    • 'date' : date of speech (example : January 27, 1984)
    • 'speech' : Full speech
    • 'what' :'State of the Union Address'
  • The code to process speech.txt is in speech.py

  • w2v_speech.py contains gensim model to learn w2v model from the speeches.

  • Speech vector is calculated by averaging all the word vectors in the speech

  • w2v_tsne.py contains the code to plot 2D version of 100 dimensional speech vectors

  • speech_vectors.npy is numpy vector of all speeches as processed in w2v_tsne.py

Here is the t-SNE plot of speech vectors: (labeled version is in the repo. Download the image to zoom in.)

Here is the distance matrix between speeches. Zoom in to see year of speech vectors.

About

Textual Analysis of speeches using Google's Word2Vec Model

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages