Skip to content

Latest commit

 

History

History
92 lines (67 loc) · 5.36 KB

README.md

File metadata and controls

92 lines (67 loc) · 5.36 KB

The effects of political messaging on user engagement on Twitter during the 2021 Dutch national elections

Author: Erik Puijk
Date: February 17, 2022

Introduction

This repository contains all relevant code and data used in my Master's Thesis project about political messaging and online engagement.
The thesis aimed to increase insights into the extent to which different messaging strategies by political parties engaged users on social media. It did so by examining the messaging of political parties and party leaders on Twitter during the 2021 elections for the House of Representatives in The Netherlands.

Technologies

  • Python 3
  • Pip3
  • Jupyter Notebook
  • Scikit Learn

Notebooks

CreateUsers.ipynb

This notebook is used to create a text file containing all Twitter users that are analyzed in the thesis and other relevant information, such as the amount of followers for the user. The text file is used later on to retrieve Tweets.

Krippendorff.ipynb

This notebook is used to calculate Krippendorff's alpha given a csv-file of codings.

MergeLabels.ipynb

This notebook is used to merge a list of annotated Tweets (either annotated manually or automatically) with the JSON-formatted file that contains all the data about the Tweets.

ModelSelection.ipynb

This notebook is used to find optimal parameters for the classification model. It uses k-fold cross validation to test various parameters to select an appropriate model for classifying Tweets.

PreProcessing.ipynb

This notebook is used to pre-process the Tweets to prepare them for generating features for the machine learning algorithm.

Results.ipynb

This notebook is used to visualize results based on the classified Tweets. The average engagement per messaging strategy is provided in table format. Also, the number of Tweets and average engagement per week can be visualized for all or a selection or parties. The use of messaging strategies per week can also be visualized for all or a selection of parties.

RetrieveTweets.ipynb

This notebook is used to retrieve Tweets using Twitter API. The users file created in CreateUsers.ipynb is used and the Tweets are written in JSON-format to a text file.

Sampling.ipynb

This notebook is used to generate random samples given a text file of Tweets. It is used to divide the data sets into subsets to perform the methodology described in the thesis.

TweetClassification.ipynb

This notebook is used to classify the remaining (unlabeled) Tweets using the optimal model found in ModelSelection.ipynb.

Data

content_analysis/line_by_line_coding.csv

Generated by Sampling.ipynb. Sample of 100 Tweets used for line-by-line coding in content analysis phase.

content_analysis/theoretical_sampling.csv

Generated by Sampling.ipynb. Sample of 100 Tweets used for theoretical sampling in content analysis phase.

source/gold_standard_labels.csv

Manually generated. Assigned labels for gold-standard set of Tweets in csv-format.

source/tweets_all.txt

Generated by RetrieveTweets.ipynb and edited by MergeLabels.ipynb. Complete data set of Tweets in JSON-format. Tweets contain no labels at all (after generation by RetrieveTweets.ipynb) or only manually assigned labels (after use of MergeLabels.ipynb).

source/tweets_all_labeled.csv

Manually generated from source/tweets_all_labeled.txt. Complete data set of Tweets in csv-format. All Tweets contain labels (either manually assigned or assigned by TweetClassification.ipynb).

source/tweets_all_labeled.txt

Generated by TweetClassification.ipynb. Complete data set of Tweets in JSON-format. All Tweets contain labels (either manually assigned or assigned by TweetClassification.ipynb).

source/tweets_all_preprocessed_exc_stopwords.txt

Generated by PreProcessing.ipynb. Complete data set of preprocessed Tweets in JSON-format.

source/users.csv

Generated by CreateUsers.ipynb. Data about Twitter users that were analyzed in csv-format.

source/users.txt

Manually generated from source/user.csv. Data about Twitter users that were analyzed in JSON-format.

samples/gold_standard.csv

Generated by Sampling.ipynb. Contains Tweets selected for the gold-standard data set in csv-format.

samples/gold_standard.txt

Generated by Sampling.ipynb. Contains Tweets selected for the gold-standard data set in JSON-format.

samples/intercoder_sample.csv

Generated by Sampling.ipynb. Contains Tweets from the gold-standard data set selected to be validated by second and third coders in csv-format.

samples/rest.txt

Generated by Sampling.ipynb. Contains all non-gold-standard Tweets in JSON-format.

Getting started

  1. Use CreateUsers.ipynb to create a local document containing user data for Tweet collection.
  2. Use RetrieveTweets.ipynb to collect and format the Tweets.
  3. Use Sampling.ipynb to create samples for the gold-standard dataset and intercoder reliability.
  4. Use Krippendorff.ipynb to verify intercoder reliability.
  5. Use PreProcessing.ipynb to pre-process the Tweets in preparation for machine learning.
  6. Use MergeLabels.ipynb to merge manual labels for the gold-standard dataset with the Tweet dataset.
  7. Use ModelSelection.ipynb to find optimal parameter values for classifying Tweets using the gold-standard dataset.
  8. Use TweetClassification.ipynb to classify the remaining (unlabeled) Tweets with the optimal model.
  9. Use Results.ipynb to visualize the results for the selection of parties you wish to see.