Gelic is Proto-Germanic for "like, alike, similar" (see Wiktionary).
This repository contains the following items:
components
contains the documents, relevance assessments and topicsassessments.txt
contains relevance assessmentscorpus.zip
contains the title data of the documentstopics.xml
contains topics used for evaluation
src
contains a bunch of conversion and analysis scriptsscripts
contains scripts to automizie different tasks that emerged in the course of the projectevalsummary.py
is a script to search a local solr instance with the titles of the topics.xml as queries.fieldnames.json
specifies in which fields solr searches. Variations could be: 1. only in "subject_auto_txt_de" 2. only in "subject_gnd_txt_de" 3. "subject_auto_txt_de" and "subject_gnd_txt_de". The results of each fieldnames-variation of queries are then evaluated by trec_eval and summarized. Thus the script is most suitable if you want a quick comparision between variations of fields that are searched in. These summary-evaluations are then saved as a csv-file.evalpertopic.py
automizes similar processes asevalsummary.py
and only the endresult differs. Instead of writing a summary in a single csv-file, the script creates as many csv-files as there are variations infieldnames.json
. In these csv-files are the trec_eval result for each query of the variation e.g. a csv-file for the variation "subject_auto_txt_de" in which the recall, precision and f measure of topics like "Kritische Theorie" can be found.erschliessung.py
results in a list of all topics with counts regarding the occurences of relevant documents divided in machine content indexing, conventional content indexing etc.synonyms.py
extracts the gnd-synonyms and their preferred term from the jsonld-fileauthorities-sachbegriff_lds_20190613.jsonld
and transforms the terms in a solr-readable synonyms.txt-file. Warning! Before using, follow the instructions of src/README.md Plus: if you want a .csv-list, there is also a commented out option in the script.
This repository is joined work of the following people:
- Philipp Schaer (phschaer, project lead)
- Klaus Lepsky (klepsky, project lead)
- Ina Böckmann (iboeckma)
- Sebastian Pommerencke (SebastianPommerencke)
- Sven Gaida (SvenGaida)
- Felix van Tellingen (fvantellingen)
- Johanna Munkelt (FH Dortmund)