The effects of political messaging on user engagement on Twitter during the 2021 Dutch national elections
Author: Erik Puijk
Date: February 17, 2022
This repository contains all relevant code and data used in my Master's Thesis project about political messaging and online engagement.
The thesis aimed to increase insights into the extent to which different messaging strategies by political parties engaged users on social media. It did so by examining the messaging of political parties and party leaders on Twitter during the 2021 elections for the House of Representatives in The Netherlands.
- Python 3
- Pip3
- Jupyter Notebook
- Scikit Learn
This notebook is used to create a text file containing all Twitter users that are analyzed in the thesis and other relevant information, such as the amount of followers for the user. The text file is used later on to retrieve Tweets.
This notebook is used to calculate Krippendorff's alpha given a csv-file of codings.
This notebook is used to merge a list of annotated Tweets (either annotated manually or automatically) with the JSON-formatted file that contains all the data about the Tweets.
This notebook is used to find optimal parameters for the classification model. It uses k-fold cross validation to test various parameters to select an appropriate model for classifying Tweets.
This notebook is used to pre-process the Tweets to prepare them for generating features for the machine learning algorithm.
This notebook is used to visualize results based on the classified Tweets. The average engagement per messaging strategy is provided in table format. Also, the number of Tweets and average engagement per week can be visualized for all or a selection or parties. The use of messaging strategies per week can also be visualized for all or a selection of parties.
This notebook is used to retrieve Tweets using Twitter API. The users file created in CreateUsers.ipynb is used and the Tweets are written in JSON-format to a text file.
This notebook is used to generate random samples given a text file of Tweets. It is used to divide the data sets into subsets to perform the methodology described in the thesis.
This notebook is used to classify the remaining (unlabeled) Tweets using the optimal model found in ModelSelection.ipynb.
Generated by Sampling.ipynb. Sample of 100 Tweets used for line-by-line coding in content analysis phase.
Generated by Sampling.ipynb. Sample of 100 Tweets used for theoretical sampling in content analysis phase.
Manually generated. Assigned labels for gold-standard set of Tweets in csv-format.
Generated by RetrieveTweets.ipynb and edited by MergeLabels.ipynb. Complete data set of Tweets in JSON-format. Tweets contain no labels at all (after generation by RetrieveTweets.ipynb) or only manually assigned labels (after use of MergeLabels.ipynb).
Manually generated from source/tweets_all_labeled.txt. Complete data set of Tweets in csv-format. All Tweets contain labels (either manually assigned or assigned by TweetClassification.ipynb).
Generated by TweetClassification.ipynb. Complete data set of Tweets in JSON-format. All Tweets contain labels (either manually assigned or assigned by TweetClassification.ipynb).
Generated by PreProcessing.ipynb. Complete data set of preprocessed Tweets in JSON-format.
Generated by CreateUsers.ipynb. Data about Twitter users that were analyzed in csv-format.
Manually generated from source/user.csv. Data about Twitter users that were analyzed in JSON-format.
Generated by Sampling.ipynb. Contains Tweets selected for the gold-standard data set in csv-format.
Generated by Sampling.ipynb. Contains Tweets selected for the gold-standard data set in JSON-format.
Generated by Sampling.ipynb. Contains Tweets from the gold-standard data set selected to be validated by second and third coders in csv-format.
Generated by Sampling.ipynb. Contains all non-gold-standard Tweets in JSON-format.
- Use CreateUsers.ipynb to create a local document containing user data for Tweet collection.
- Use RetrieveTweets.ipynb to collect and format the Tweets.
- Use Sampling.ipynb to create samples for the gold-standard dataset and intercoder reliability.
- Use Krippendorff.ipynb to verify intercoder reliability.
- Use PreProcessing.ipynb to pre-process the Tweets in preparation for machine learning.
- Use MergeLabels.ipynb to merge manual labels for the gold-standard dataset with the Tweet dataset.
- Use ModelSelection.ipynb to find optimal parameter values for classifying Tweets using the gold-standard dataset.
- Use TweetClassification.ipynb to classify the remaining (unlabeled) Tweets with the optimal model.
- Use Results.ipynb to visualize the results for the selection of parties you wish to see.