SARC

Evaluation code for the Self-Annotated Reddit Corpus (SARC).

Dependencies: NLTK, scikit-learn, text_embedding.

To recreate the all-balanced and pol-balanced results in Table 2 of the paper:

download 1600-dimensional Amazon GloVe embeddings (NOTE: 2.4 GB compressed)
set the root directory of the SARC dataset at the top of utils.py
run the following ($EMBEDDING is the file of downloaded GloVe embeddings)

Bag-of-Words on all: python SARC/eval.py main -l --min_count 5
Bag-of-Bigrams on all: python SARC/eval.py main -n 2 -l --min_count 5
Embedding on all: python SARC/eval.py main -e -l --embedding $EMBEDDING
Bag-of-Words on pol: python SARC/eval.py pol -l
Bag-of-Bigrams on pol: python SARC/eval.py pol -n 2 -l
Embedding on pol: python SARC/eval.py pol -e -l --embedding $EMBEDDING

If you find this code useful please cite the following:

@inproceedings{khodak2018corpus,
  title={A Large Self-Annotated Corpus for Sarcasm},
  author={Khodak, Mikhail and Saunshi, Nikunj and Vodrahalli, Kiran},
  booktitle={Proceedings of the Linguistic Resource and Evaluation Conference (LREC)},
  year={2018}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
eval.py		eval.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SARC

About

Releases

Packages

Contributors 2

Languages

License

NLPrinceton/SARC

Folders and files

Latest commit

History

Repository files navigation

SARC

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages