This repo contains the code and data for analyzing subjective judgments of witness responses in U.S. congressional hearings (paper to appear in NAACL 2021). If you make use of the data or code, please cite:
Ferracane, Elisa TBD
Our changes to the codebase for curriculum learning(stage wise) can be viewed from curriculum-learning-stage
branch and for the student-expert model, the branch name is student-teacher-changes
python -u -m models.bert_hier.main_cv --dataset CongressionalHearingEasy --model-family roberta --model roberta-base --max-seq-length 512 --evaluate-test --patience 30 --lr 3e-5 --warmup-proportion 0.1 --weight-decay 0.1 --batch-size 8 --epochs 30 --seed 1234 --metrics-json metrics_roberta_easy_hierarchical_gold_sentiments_coarse_num_r_text_test_5_dec.json --first-input-column 16 --use-second-input --second-input-column 2 > ch_roberta_easy_hierarchical_gold_sentiments_coarse_num_r_text_test_5_dec.log
python -u -m models.bert_hier.main_cv --dataset CongressionalHearingMed --model-family roberta --model roberta-base --max-seq-length 512 --evaluate-test --patience 30 --lr 3e-5 --warmup-proportion 0.1 --weight-decay 0.1 --batch-size 8 --epochs 30 --seed 1234 --metrics-json metrics_roberta_med_hierarchical_gold_sentiments_coarse_num_r_text_test_5_dec.json --first-input-column 16 --use-second-input --second-input-column 2 > ch_roberta_med_hierarchical_gold_sentiments_coarse_num_r_text_test_5_dec.log
python -u -m models.bert_hier.main_cv --dataset CongressionalHearingHard --model-family roberta --model roberta-base --max-seq-length 512 --evaluate-test --patience 30 --lr 3e-5 --warmup-proportion 0.1 --weight-decay 0.1 --batch-size 8 --epochs 30 --seed 1234 --metrics-json metrics_roberta_hard_hierarchical_gold_sentiments_coarse_num_r_text_test_5_dec.json --first-input-column 16 --use-second-input --second-input-column 2 > ch_roberta_hard_hierarchical_gold_sentiments_coarse_num_r_text_test_5_dec.log
python -u -m models.bert_hier.main_cv --dataset CongressionalHearingExplanations --model-family roberta --model roberta-base --max-seq-length 512 --evaluate-test --patience 30 --lr 3e-5 --warmup-proportion 0.1 --weight-decay 0.1 --batch-size 8 --epochs 30 --seed 1234 --metrics-json roberta-explanations/metrics_roberta_hierarchical_explanations_concat_final.json --first-input-column 38 --save-path 'model_checkpoints/roberta-explanations-only' > roberta-explanations/ch_roberta_hierarchical_explanations_concat_final.log
python -u -m models.bert_hier.main_cv --dataset CongressionalHearingExplanations --model-family roberta --model roberta-base --max-seq-length 512 --evaluate-test --patience 30 --lr 3e-5 --warmup-proportion 0.1 --weight-decay 0.1 --batch-size 8 --epochs 30 --seed 1234 --metrics-json roberta-student-expert/metrics_roberta_hierarchical_student_expert_concat_test.json --first-input-column 16 --use-second-input --second-input-column 2 --use-third-input --third-input-column 38 --use_expert_model --expert_model_path_fold_0 ./model_checkpoints/roberta-explanations-only/CongressionalHearingFoldsExplanations/fold0/2022-12-08_18-03-55.pt --expert_model_path_fold_1 ./model_checkpoints/roberta-explanations-only/CongressionalHearingFoldsExplanations/fold1/2022-12-08_18-42-25.pt --expert_model_path_fold_2 ./model_checkpoints/roberta-explanations-only/CongressionalHearingFoldsExplanations/fold2/2022-12-08_19-20-58.pt --expert_model_path_fold_3 ./model_checkpoints/roberta-explanations-only/CongressionalHearingFoldsExplanations/fold3/2022-12-08_19-59-28.pt --expert_lambda 0.6 --save-path 'model_checkpoints/student_expert' > roberta-student-expert-colab-lambda/ch_roberta_hierarchical_student_expert_concat_test.log
If you're here just for the data, you can download it here: gold_cv_dev_data.tar.gz. Unpack with tar -zxvf gold_cv_dev_data.tar.gz
.
First, create a conda environment and activate it:
conda create --name subjective python=3.8
conda activate subjective
Install pytorch and cuda:
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
Clone this repo and install the requirements:
git clone https://github.com/elisaF/subjective_discourse
cd subjective_discourse
pip install -r requirements.txt
Unpack the data splits:
cd subjective_discourse/data/gold
tar -zxvf gold_cv_dev_data.tar.gz
The multi-label classification task consists of predicting all the possible response labels, and is evaluated with macro-averaged F1.
Roberta: predict the response labels using the response text.
python -u -m models.bert.main_cv --dataset CongressionalHearing --model-family roberta --model roberta-base --max-seq-length 512 --evaluate-test --patience 5 --lr 3e-5 --warmup-proportion 0.1 --weight-decay 0.1 --batch-size 8 --epochs 30 --seed 1234 --metrics-json metrics_roberta_classification_r_text_test.json --first-input-column 2 > ch_roberta_classification_r_text_test.log 2>&1
Hierarchical: predict the response labels while also training to predict the conversation acts.
python -u -m models.bert_hier.main_cv --dataset CongressionalHearing --model-family roberta --model roberta-base --max-seq-length 512 --evaluate-test --patience 30 --lr 3e-5 --warmup-proportion 0.1 --weight-decay 0.1 --batch-size 8 --epochs 30 --seed 1234 --metrics-json metrics_roberta_hierarchical_r_text_test.json --first-input-column 2 > ch_roberta_hierarchical_r_text_test.log 2>&1 &
+Question: predict the response labels as in the hierarchical model, but additionally using the last question.
python -u -m models.bert_hier.main_cv --dataset CongressionalHearing --model-family roberta --model roberta-base --max-seq-length 512 --evaluate-test --patience 30 --lr 3e-5 --warmup-proportion 0.1 --weight-decay 0.1 --batch-size 8 --epochs 30 --seed 1234 --metrics-json metrics_roberta_hierarchical_q_text_last_question_r_text_test.json --first-input-column 4 --use-second-input --second-input-column 2 > ch_roberta_hierarchical_q_text_last_question_r_text_test.log 2>&1
+Annotator: predict the response labels as in the hierarchical model, but additionally using the annotator sentiments.
python -u -m models.bert_hier.main_cv --dataset CongressionalHearing --model-family roberta --model roberta-base --max-seq-length 512 --evaluate-test --patience 30 --lr 3e-5 --warmup-proportion 0.1 --weight-decay 0.1 --batch-size 8 --epochs 30 --seed 1234 --metrics-json metrics_roberta_hierarchical_gold_sentiments_coarse_num_r_text_test.json --first-input-column 16 --use-second-input --second-input-column 2 > ch_roberta_hierarchical_gold_sentiments_coarse_num_r_text_test.log 2>&1
+Question+Annotator: predict the response labels as in the hierarchical model, but additionally using the last question and the annotator sentiments.
python -u -m models.bert_hier.main_cv --dataset CongressionalHearing --model-family roberta --model roberta-base --max-seq-length 512 --evaluate-test --patience 30 --lr 3e-5 --warmup-proportion 0.1 --weight-decay 0.1 --batch-size 8 --epochs 30 --seed 1234 --metrics-json metrics_roberta_hierarchical_r_text_gold_sentiments_coarse_num_q_text_last_question_test.json --first-input-column 2 --use-second-input --second-input-column 16 --use-third-input --third-input-column 4 > ch_roberta_hierarchical_r_text_gold_sentiments_coarse_num_q_text_last_question__test.log 2>&1
The regression task consists of predicting the normalized entropy of the response label distribution, and is evaluated with RMSE.
Roberta: predict the response labels using only the response text. Note this experimental model is run on the dev fold
cd subjective_discourse/code/models/hedwig
../../shell_scripts/run_roberta_regression_dev.sh