Job Title Text Ranking Evaluation Dataset

This repository provides job title similarity datasets in 11 different languages, including the English evaluation dataset used in the text ranking experiments reported in the paper "Learning Job Titles Similarity from Noisy Skill Labels" by Rabih Zbib et al. [1], as well as the translated datasets for the other languages introduced in "Combined Unsupervised and Contrastive Learning for Multilingual Job Recommendations" by Deniz et al. [2]

Task Overview

The task involves ranking a set of job titles, given another job title as query, such that the resulting ranking reflects the semantic similarity of each job title to the query. This requires identifying the most relevant job titles based on their semantic similarity to a given query.

English Dataset Creation

The English dataset was built by starting with 2,724 job titles (short text phrases with the name of an occupation). Next, a minimal pre-processing step with light clean up was applied. The job titles were randomly divided into two groups:

105 job titles were used as queries.
The remaining job titles were used as corpus documents.

Each query/document pair was labeled for binary relevance after adjudicating two independent human annotations. (The inter-annotator agreement for this binary relevance labels was 86%.)

The result is a dataset composed of the following files:

corpus_documents.tsv: includes the mapping between corpus_document_id to the text form of each job title in the corpus.
queries.tsv: includes the mapping between query_id to the text form of each query job title.
annotations.tsv: includes the binary relevance annotations between. Only the relevant pairs are included, and the file is formatted in the format required by trec_eval software library.

Translated Datasets

We also provide the translated datasets in 10 more additional languages. We replicate the English dataset for the other languages by using Human Translation (HT) or Machine Translation (MT).

The datasets can be found under the dataset/ folder. Each dataset is under the folder with their ISO 639-1 two-character code.

ISO 6939-1 code	Language	Translation
en	English	-
de	German	HT
es	Spanish	MT
fr	French	HT
it	Italian	MT
ja	Japanese	HT
ko	Korean	MT
nl	Dutch	MT
pl	Polish	MT
pt	Portuguese	MT
zh	Chinese	HT

Citation

[1] Rabih Zbib, Lucas Alvarez, Federico Retyk, Rus Poves, Juan Aizpuru, Hermenegildo Fabregat, Vaidotas Simkus, Emilia Garcia Casademont: Learning Job Titles Similarity from Noisy Skill Labels. FEAST, ECML-PKDD 2022 Workshop (2021)

[2] Daniel Deniz, Federico Retyk, Laura García-Sardiña, Hermenegildo Fabregat, Luis Gasco and Rabih Zbib: Combined Unsupervised and Contrastive Learning for Multilingual Job Recommendation. HR@RecSys (2024)

@article{zbib2022Learning,
      title={{Learning Job Titles Similarity from Noisy Skill Labels}}, 
      author={Rabih Zbib and 
              Lucas Alvarez Lacasa and 
              Federico Retyk and 
              Rus Poves and 
              Juan Aizpuru and 
              Hermenegildo Fabregat and
              Vaidotas Šimkus and 
              Emilia García-Casademont},
      journal    = {{FEAST, ECML-PKDD 2022 Workshop}},
      year         = {{2022}},
      url = "https://feast-ecmlpkdd.github.io/archive/2022/papers/FEAST2022_paper_4972.pdf"
}


@inproceedings{deniz2024Combined,
  title        = {Combined Unsupervised and Contrastive Learning for Multilingual Job Recommendations},
  author       = {Daniel Deniz and
                  Federico Retyk and
                  Laura García-Sardiña and
                  Hermenegildo Fabregat and
                  Luis Gasco and
                  Rabih Zbib},
  booktitle    = {Proceedings of the 4th Workshop on Recommender Systems for Human Resources
                  (RecSys in {HR} 2024), in conjunction with the 18th {ACM} Conference on
                  Recommender Systems},
  year         = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataset		dataset
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Job Title Text Ranking Evaluation Dataset

Task Overview

English Dataset Creation

Translated Datasets

Citation

About

Releases

Packages

Contributors 2

License

Avature/jobtitlesimilarity-dataset

Folders and files

Latest commit

History

Repository files navigation

Job Title Text Ranking Evaluation Dataset

Task Overview

English Dataset Creation

Translated Datasets

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages