Machine Learning model that learns from Unified Medical Language System Metathesaurus (UMLS Metathesaurus) database tagging new graph in Semantic Network
0- Complete the .env file with the following variables :
HOST=<host_of_your_umls_database>
USER=<user_of_your_umls_database>
PASSWORD=<password_of_your_umls_database>
DB=<name_of_your_umls_database>
UMLS_API_KEY=<your_api_key>
1- Install the required packages
$ pip install -r requirements.txt
2.1- Modify the configuration file as you want
Very important to check all the parameters
2.2- Launch pipeline
You can use flags to customize the args in the pipeline
$ python main.py -h
> usage: main.py [-h] [--verbose] [--only_source] [--only_preprocess] [--from_preprocess] [--only_training]
[--limit LIMIT] [--debug_output_path DEBUG_OUTPUT_PATH] --run_name RUN_NAME
optional arguments:
-h, --help show this help message and exit
--verbose Active verbose mode.
--only_source Pipeline launchs only the generation of the source data.
--only_preprocess Pipeline launchs only the preprocess of the source data.
--from_preprocess Pipeline launchs from the preprocess of the source data.
--only_training Pipeline launchs only the training of the preprocessed data.
--limit LIMIT Limit of the source data number generated.
--debug_output_path DEBUG_OUTPUT_PATH
Path of the output log.
--run_name RUN_NAME REQUIRED: Name of the run.
Examples:
- Launching all pipeline (data generation + preprocess + training & test + graph prediction)
$ python main.py --run_name="NAME_OF_THE_EXPERIMENT_RUN"
- Launching in verbose mode only 100 data generation generating new artefact/data.csv
$ python main.py --run_name="NAME_OF_THE_EXPERIMENT_RUN" --only_source --limit=100 --verbose
- Launching from preprocess generating new artefact/preprocessed_data.csv + training & test + graph prediction
$ python main.py --run_name="NAME_OF_THE_EXPERIMENT_RUN" --only_source --limit=100 --verbose
The most used command:
- Launching in verbose mode training & test + graph prediction
python3 main.py --run_name="NAME_OF_THE_EXPERIMENT_RUN" --only_training --verbose
3- Use MlFlow UI to visualize data in localhost:5000
$ mlflow ui
We build our own UMLS API to get the data from UMLS Metathesaurus database. To use it, you need to install the UMLS database locally. You can download the database from here and install it following these instructions. Then, you need to import the umls_api
python package.
We also use the UMLS REST API to get the data from UMLS Metathesaurus.