This project has been initiated by Contexity, in the context of our participation in the SwissText 2021 conference.
The goal of the project is to create a series of robust classifiers for questions in German texts, with emphasis on (transcribed) spoken and written dialogues.
We will be adding information on the state of the work and the roadmap in the next days and weeks. For now, you can find the following in this repository:
- Our poster for the SwissText conference, which summarizes the first part of the work, namely the classification of utterances in German into questions / no questions --> in
./poster
- The evaluation of our approach based on GottBERT on the Dortumunter Chat Corpus, including code and results --> in
./Evaluation of GottBERT-based approach
- Code that implements the evaluation tasks, and can also be used as building blocks for a prediction service --> in
./qcg-base-nlp-tool
,./qcg-spacy-tool
, and./qcg-sentence analyzer
Check back in the next days and weeks for additional information on the background of the work, and updates on the progress we're making.
Are you interested in contributing in any way to this project?! Please get in touch! Part of the reason we're open sourcing this project is to be able to work together with the community on this challenging problem! :-)