Megan Herrera, Ankit Aich, Natalie Parde
Department of Computer Science
University of Illinois at Chicago
{mherre42, aaich2, parde}@uic.edu
Download our dataset from here directly. Refer to the instructions below for downloading the data.
If you use the data or benefit from the paper, please cite
@inproceedings{herrera_aich_parde, title={Language Resources and Evaluation. LREC 2022}, booktitle={TweetTaglish: A Dataset for Investigating Tagalog-English Code-Switching}, author={Herrera, Megan and Aich, Ankit and Parde, Natalie} }
A large (20k+ instances) Tagalog-English code-switching dataset, harvested from Twitter.
tweets_split_id.csv - Contains tweet IDs and their Tagalog/English/Other split as (Tagalog, English, Other) tuples
embeddings.csv - Contains tweet embeddings and Tagalog/English/Other splits