Sarcasm detection is the second shared task of FigLang2020 ,co-located with ACL 2020
For more information about the shared task and to participate visit CodaLab site
Training data can be downloaded in Github
- label :
SARCASM
orNOT_SARCASM
- response : the sarcastic response, whether a sarcastic Tweet or a Reddit post
- context : the conversation context of the response
- Note, the context is an ordered list of dialogue, i.e., if the context contains three elements,
c1
,c2
,c3
, in that order, thenc2
is a reply toc1
andc3
is a reply toc2
. Further, if the sarcastic response isr
, thenr
is a reply toc3
.
- Note, the context is an ordered list of dialogue, i.e., if the context contains three elements,
For instance, for the following example :
"label": "SARCASM", "response": "Did Kelly just call someone else messy? Baaaahaaahahahaha", "context": ["X is looking a First Lady should . #classact, "didn't think it was tailored enough it looked messy"]
The response tweet, "Did Kelly..." is a reply to its immediate context "didn't think it was tailored..." which is a reply to "X is looking...". Your goal is to predict the label of the "response" while also using the context (i.e, the immediate or the full context).
- Twitter : constructed a data set of 5,000 English Tweets balanced between the
SARCASM
andNOT_SARCASM
classes. - Reddit : it is a dataset of 4,400 Reddit posts balanced between the
SARCASM
andNOT_SARCASM
classes.
- python 3.6
- torch 1.0+
- scikit-learn
- tqdm
- pandas
- jsonlines
Baseline
Model | dev/test F1-score | input |
---|---|---|
bert(uncased-large-wwm) | 81.243% | response |
bert(uncased-large-wwm) | 82.200% | context+response |
bert(cased-large-wwm) | 82.553%/69.200% | context+response |
bert(cased-large) | 83.147%/72.619% | context+response |
Advanced
Model | dev/test F1-score |
---|---|
bert(last 3 layer) | 83.355%/73.189% |
Model | dev/test F1-score | input |
---|---|---|
bert(cased-large)+gru | 71.352%/63.042% | response |
1、Download bert pretain model to ./bert-large-cased-wwm
and rename them as:
config.json
;pytorch_model.bin
;vocab.txt
2、Prepare the training and dev data(4:1 and 5 fold)
python ./data/twitter/preprocess_twitter.py
3、Train the model
sh run_bert sh