Corpora:
A gold standard dataset of randomized controlled trial (RCT) abstracts annotated with the EvidenceMap representation including two corpora are provided.
The “General” corpus includes a broad range of disease domains by randomly selecting 229 RCT article abstracts.
The “COVID-19” corpus includes 80 randomly selected COVID-19 RCT article abstracts to accommodate the increased demand for related evidence retrieval and synthesis resources during the pandemic.
The descriptive statistics of these two annotated corpora are listed in Table.
Dependent evidence relationships were used for constructing MEPs, and independent relationships can serve as negative samples for training machine learning based NLP models.
All annotations were conducted using the web-based interactive annotation tool Brat (https://brat.nlplab.org/). An example abstract with annotations is presented:
Pretrained Models:
Download pretrained models here.
Running Environment:
- Install tensorflow==2.3
- Install bert-for-tf2
- Install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_lg-0.4.0.tar.gz
Running the Code:
- Unzip model.zip
- Move all files under model/
- Modify parser configuration parser_config.py
- Run examples in wrapper.sh