An implementation of Medical Concept Mapping via three levels: Syntax-Semantics-Pragmatics.
Towards building an Artificial Intelligence-oriented (AI) healthcare system, precise mapping of medical concepts is highly demanded. Traditional works decoded medical terms lacking the consideration of a comprehensive overview of Natural Language Processing (NLP). However, for downstream NLP tasks, an analysis from different perspectives grows popular. In this work, a novel approach of medical concept mapping was presented from three aspects of NLP analysis, i.e., syntax, semantics, and pragmatics levels. Via the Byte Pair Encoding (BPE) Algorithm, the subwords' representations were introduced to learn the compounding and transliteration of medical concepts. Then, knowledge graph took advantages of human common sense in the perspective of pragmatics analysis. The final pre-trained word embedding and cosine similarity were utilized to map the input to the standard term which retain the maximum similarity. From the above three levels, the proposed approach has achieved compelling performance in the Chinese medical dataset, 96.81% accuracy. It indicated that our proposed method was able to handle the challenge of medical concept mapping, which can indirectly promoted the performance of healthcare AI systems.
Overall Method is shown below:
Specific Method:
-
Syntax-level: Sub-word Frequency via BPE Algorithm
-
Semantics-level: Word vector Cosine Similarity
-
Pragmatics-level: Knowledge Graph (JSON Format)
- Get Sub-word (Frequency) list
$ STEP-1-get-subword.py
- Get Standard and Synonym Medical Terms
$ STEP-2-get-Knowledge-Graph.py
- Run the Concept Mapping main Function
$ main.py
- To evaluate, run the Evaluation Function
$ evaluate.py
96.81% Accuracy on the Standard and Synonym Medical Terms
The pre-trained word vectors can be downloaded here.
The data used for generating the sub-word list can be downloaded here.
A presentation of this work can be downloaded here.
This work was done when I was in Philips Research Shanghai.