Skip to content

LREC 2022 Taglish code-switching Megan Herrera, Ankit Aich, Natalie Parde

Notifications You must be signed in to change notification settings

uic-nlp-lab/LREC-2022-CodeSwitch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

TweetTaglish Dataset

Megan Herrera, Ankit Aich, Natalie Parde
Department of Computer Science University of Illinois at Chicago
{mherre42, aaich2, parde}@uic.edu

Download our dataset from here directly. Refer to the instructions below for downloading the data.

If you use the data or benefit from the paper, please cite

@inproceedings{herrera_aich_parde, title={Language Resources and Evaluation. LREC 2022}, booktitle={TweetTaglish: A Dataset for Investigating Tagalog-English Code-Switching}, author={Herrera, Megan and Aich, Ankit and Parde, Natalie} }

A large (20k+ instances) Tagalog-English code-switching dataset, harvested from Twitter.

tweets_split_id.csv - Contains tweet IDs and their Tagalog/English/Other split as (Tagalog, English, Other) tuples
embeddings.csv - Contains tweet embeddings and Tagalog/English/Other splits

About

LREC 2022 Taglish code-switching Megan Herrera, Ankit Aich, Natalie Parde

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published