SemEval2024 Task8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

Description

Detecting machine-generated text is a critical task in the era of large language models. In this paper, we present our systems for SemEval-2024 Task 8, which focuses on multi-class classification to discern between human-written and maching-generated texts by five state-of-the-art large language models. We propose three different systems: unsupervised text similarity, triplet-loss-trained text similarity, and text classification. We show that the triplet-loss-trained text similarity system outperforms the other systems, achieving 80% accuracy on the test set and surpassing the baseline model for this subtask. Additionally, our text classification system, which takes into account sentence paraphrases generated by the candidate models, also outperforms the unsupervised text similarity system, achieving 74% accuracy.

Contrastive Triplet Loss Trained Model

Text similarity models can also be trained on the provided training data.1 For this approach, we train a sentence transformer model with a triplet loss, which requires three inputs during training: anchor, positive, and negative samples (xi, xi+, xj −). This loss function aims to minimize the distance between the anchor and positive data (xi, xi+) while simultaneously maximizing the distance between the anchor and negative data(xi, xj −) (Ren and Xue, 2020). We conduct this training to enhance the vector representations of texts for multi-class classification.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitattributes		.gitattributes
README.md		README.md
make_dataset_based_sim.py		make_dataset_based_sim.py
taskB_classifcation_test.py		taskB_classifcation_test.py
taskB_classification.py		taskB_classification.py
taskB_simSCE_dataset.py		taskB_simSCE_dataset.py
taskB_simSCE_training.py		taskB_simSCE_training.py
taskB_simcse.py		taskB_simcse.py
taskB_simcse_test.py		taskB_simcse_test.py
taskB_triplet_dataset.py		taskB_triplet_dataset.py
taskb_simcse_lossfunction.py		taskb_simcse_lossfunction.py
trainb_simcse_train.py		trainb_simcse_train.py
triplet_classification.py		triplet_classification.py
triplet_classification_test.py		triplet_classification_test.py
triplet_embeddiing_train.py		triplet_embeddiing_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SemEval2024 Task8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

Description

Contrastive Triplet Loss Trained Model

About

Releases

Packages

Languages

yeaeunkwon/Semeval2024_Task8

Folders and files

Latest commit

History

Repository files navigation

SemEval2024 Task8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

Description

Contrastive Triplet Loss Trained Model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages