This is the official implementation of "Using Persuasive Writing Strategies to Explain and Detect Health Misinformation."
- Compatible with Python 3.8.15
- Dependencies should be installed with conda using the
env.yml
$ conda env create --name misinformation_detection --file=env.yml
$ conda activate misinformation_detection
Our dataset is located in the data
folder:
data/all.xlsx
- This file contains the sentences for task 2 and their corresponding persuasive writing strategy labels.
data/all_article.xlsx
- This file contains claims and articles for tasks 1 and 3, with their labels from MultiFC.
data/train.xlsx
- Training split for persuasive writing strategy detection.
data/train_article.xlsx
- Training split for misinformation detection task.
data/test.xlsx
- Testing split for persuasive writing strategy detection.
data/test_article.xlsx
- Testing split for misinformation detection task.
The raw and unprocessed data, which is the export from the WebAnno tool, is available in the ./data/annotation
folder. To perform data preprocessing from scratch, you need to have the MultiFC dataset in the ./data
folder. Place all of the MultiFC files in the following structure:
data/multi-fc
├── dev.tsv
├── README.txt
├── snippets
├── test.tsv
└── train.tsv
Then, run the code src/data_preprocess.py
to generate the clean data from raw data. Note that the current data excludes the low-frequency labels. If you want to use the dataset without the removed labels, change the function get_low_freq
in the annotation.py
file
To train and test RoBERTa on the persuasive strategy labeling, you should run the following scripts based on the level:
sh scripts/layer_1.sh
sh scripts/layer_2.sh
sh scripts/layer_3.sh
sh scripts/layer_4.sh
You can run python src/persuasive_strategy_test.py
to evaluate the performance of the trained models.
To run the experiments for misinformation detection with RoBERTa, run the following files:
sh scripts/md_claim.sh
, input source: claimsh scripts/md_article.sh
, input source: articlesh scripts/md_gt_strategy.sh
, input source: gt (ground truth persuasive strategy labels)sh scripts/md_pred_strategy.sh
, input source: pred (predicted persuasive strategy labels)sh scripts/md_claim_article.sh
, input source: claim + articlesh scripts/md_claim_gt.sh
, input source: claim + gtsh scripts/md_claim_pred.sh
, input source: claim + predsh scripts/md_claim_article_gt.sh
, input source: claim + article + gtsh scripts/md_claim_article_pred.sh
, input source: claim + article + pred
Notice: Run one of the files with $Pred$
before running any of the experiments to generate the CSV files required for the experiments.
To run the GPT experiments with all of the input variations, run the sh scripts/md_gpt.sh
file.
First, ensure the data is as mentioned in the preprocessing section. Then you can run the code src/multifc.py
to train RoBERTa-based models on the MultiFC prompt
subset. You can also use src/multifc_test.py
to evaluate the trained model and calculate the average performance.
To perform the RAWFC dataset experiments with in-context learning, first, you need to download the files available at https://www.dropbox.com/sh/1w7crp3hauoec5m/AABJpG6YWbqrumypBpHJEDnSa?dl=0 and place them in the data/RAWFC
folder with the following structure. Then you need to run the pre-process.py
code to convert the data into the proper format. Finally, run the code gpt-3-raw-fc-in-context.py
to evaluate the performance.
data/RAWFC
├── pre-process.py
├── README.MD
├── test
├── test.csv (generated by pre-process.py)
├── train
├── train.csv (generated by pre-process.py)
├── val
└── val.csv (generated by pre-process.py)