DISTANT-CTO combines methods from distant supervision and dynamic programming and uses freely-available resources like clinicaltrial.org to obtain a massive corpus of 'Intervention' and 'Comparator' entity annotations.
- Candidate generation: The process of generating pseudo-labeled or distant-labeled dataset using the combination of distant supervision and dynamic programming. We call these distantly labeled dataset as DISTANT-CTO.
- Model training: Once the distantly-labeled candidates are generated, transformer-based discriminative 'Intervention' and 'Comparator' NER models were trained.