Politicians Are Also People: Mapping Is All You Need

Clustering Entity Types in Cross-Domain Relation Classification Setups

This repistory contains the data, code, and paper for the Politicians Are Also People project by David Peter Süle, Mie Jonasson, and Nicklas Koch Rasmussen, originally made for the Introduction to Natural Language Processing — Second Year Project course at the IT University of Copenhagen.

Research Question

What is the performance impact of clustering domain-specific named entity types in cross-domain relation-classification setups and what benchmark can be established for future research?

Abstract

Relation Extraction is an evolving field within natural language processing. As its last step, Relation Classification (RC) aims to identify the relation type to which two semantically related named entities belong. Cross-domain setups are especially challenging, even more so when domain-specific entity types are used. Research is scarce in the area and mostly focuses on using generic entity types or simply fine-tune the model on a single target domain. This might still offer challenges when annotated data is not accessible for fine-tuning.

In this paper we explore ways of clustering domain-specific named entity types to reduce cross-domain complexity and improve performance on previously unseen domains. We propose five different methods of grouping entity types and evaluate them in multi-domain and out-of-domain scenarios using our two new benchmarks. In conclusion, we find that all our entity mapping methods outperform the baseline in the out-of-domain setting, with the best performing model improving on the baseline by $8.6$ percentage points in weighted F1.

Attribution

Our work relied heavily on the CrossRE project by Elisa Bassignana and Barbara Plank: CrossRE: A Cross-Domain Dataset for Relation Extraction (Bassignana & Plank, Findings 2022), and their repistory.

How to run the project

Installing requirements

pip install -r requirements.txt

Run training, predictions, and calculate results

./run.sh

Folder Structure

data
- crossre_data
  - The training-, development- and test-data as provided by the CrossRE project.
- predictions
  - names of folders: DOMAIN-LIST_SEED where the domain list is abbreviated from the first letter of the domains used during training; contains predictions produced by running main script.
  - ood_clustering_data: data for training with OOD clustering method.
- results: Aggregated results
figures: images / plots used for the report.
src
- Scripts used for training. These are mainly supplied by the CrossRE project, with slight modifications.
util
- Helper functions to check validity of results.

(Note: 'ood validation' stands for OOD evaluation and 'all' stands for the multi-domain results in the file names.)

Cite this paper

@misc{politicians-are-people,
  title        = "Politicians Are Also People: Mapping Is All You Need",
  author       = "S{\"u}le, David Peter and Jonasson, Mie and Rasmussen, Nicklas Koch",
  howpublished = "\url{https://github.com/davidsule/politicians_are_also_people}",
  year         = "2023",
  school       = "IT University of Copenhagen",
  address      = "Copenhagen, Denmark",
  note         = "Introduction to Natural Language Processing — Second Year Project course report",
  abstract     = "Relation Extraction is an evolving field within natural language processing. As its last step, Relation Classification (RC) aims to identify the relation type to which two semantically related named entities belong. Cross-domain setups are especially challenging, even more so when domain-specific entity types are used. Research is scarce in the area and mostly focuses on using generic entity types or simply fine-tune the model on a single target domain. This might still offer challenges when annotated data is not accessible for fine-tuning. In this paper we explore ways of clustering domain-specific named entity types to reduce cross-domain complexity and improve performance on previously unseen domains. We propose five different methods of grouping entity types and evaluate them in multi-domain and out-of-domain scenarios using our two new benchmarks. In conclusion, we find that all our entity mapping methods outperform the baseline in the out-of-domain setting, with the best performing model improving on the baseline by 8.6 percentage points in weighted F1.
}

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
data		data
figures		figures
src		src
util		util
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
Politicians_are_also_people.pdf		Politicians_are_also_people.pdf
README.md		README.md
categorize.ipynb		categorize.ipynb
categorize.py		categorize.py
evaluate.py		evaluate.py
main.py		main.py
requirements.txt		requirements.txt
run.sh		run.sh
summary_tables.ipynb		summary_tables.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Politicians Are Also People: Mapping Is All You Need

Clustering Entity Types in Cross-Domain Relation Classification Setups

Research Question

Abstract

Attribution

How to run the project

Installing requirements

Run training, predictions, and calculate results

Folder Structure

Cite this paper

About

Releases

Packages

Contributors 3

Languages

License

davidsule/politicians_are_also_people

Folders and files

Latest commit

History

Repository files navigation

Politicians Are Also People: Mapping Is All You Need

Clustering Entity Types in Cross-Domain Relation Classification Setups

Research Question

Abstract

Attribution

How to run the project

Installing requirements

Run training, predictions, and calculate results

Folder Structure

Cite this paper

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages