Skip to content

Commit

Permalink
Merge branch 'main' into CU-8694vxbkh-1.12
Browse files Browse the repository at this point in the history
  • Loading branch information
mart-r authored Oct 8, 2024
2 parents 283ab60 + 69a225d commit 39cf6d9
Show file tree
Hide file tree
Showing 28 changed files with 25,883 additions and 119 deletions.
1 change: 1 addition & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ jobs:
- name: Test
run: |
python -m unittest discover
python -m unittest discover -s medcat/compare_models
# TODO - in the future, we might want to add automated tests for notebooks as well
# though it's not really possible right now since the notebooks are designed
# in a way that assumes interaction (i.e specifying model pack names)
10 changes: 6 additions & 4 deletions credentials.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@
# If you do not have a UMLS account, you may apply for a license on the UMLS Terminology Services (UTS) website.
# https://documentation.uts.nlm.nih.gov/rest/authentication.html

# TODO: add option for UMLS api key auth
# UMLS api key auth
umls_apikey = None


# SNOMED authentication from international and TRUD
# TODO add arg for api key auth
# SNOMED authentication from NHS TRUD. International releases will require different API access creds.
# api key auth from NHS TRUD
# For more information please see: https://isd.digital.nhs.uk/trud/users/guest/filters/0/api
snomed_apikey = None
221 changes: 128 additions & 93 deletions data/snomed/preprocessing_snomed_ct.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,141 @@
"\n",
"SNOMED CT is a standarised clinical terminology consisting of >350,000 unique concepts. It is owned, maintained and distributed by SNOMED International.\n",
"\n",
"## Access to SNOMED CT files\n",
"\n",
"Please visit and explore https://www.snomed.org/ to find out further information about the various SNOMED CT products and services which they offer.\n",
"\n",
"-------\n",
"\n",
"UK Edition files can be found via [NHS TRUD](https://isd.digital.nhs.uk/)\n",
"\n",
"Download files via API coming soon...\n",
"\n",
"\n",
"--------\n",
"\n",
"All raw files from SNOMED should be placed in the local directory [here](data/snomed)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using the NHS TRUD API\n",
"\n",
"### Release list endpoint\n",
"\n",
"##### Request\n",
"A request to this endpoint is a HTTP GET of a URL that looks like this:\n",
"\n",
"https://isd.digital.nhs.uk/trud/api/v1/keys/deadc0de/items/123/releases\n",
"\n",
"Replace *deadc0de* with the API key, and *123* with the item number.\n",
"\n",
"Item numbers can be found in the URLs of releases pages. For example, the URL for the [NHS National Interim Clinical Imaging Procedures](https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/2/items/14/releases) releases page is:\n",
"\n",
"https://isd.digital.nhs.uk/trud/users/guest/filters/0/categories/2/items/14/releases\n",
"\n",
"In this example the item number is 14.\n",
"\n",
"To request only the latest release add *?latest* to the URL, likew this:\n",
"\n",
"https://isd.digital.nhs.uk/trud/api/v1/keys/deadc0de/items/123/releases?latest"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import sys\n",
"from getpass import getpass\n",
"sys.path.append('../..')\n",
"from credentials import * # you can store your api key here"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Important URLs\n",
"if snomed_apikey is not None:\n",
" pass\n",
"else:\n",
" snomed_apikey = getpass('Please enter your SNOMED api key: ')\n",
"# SNOMED CT UK Clinical Edition\n",
"clinical_info_url = f'https://isd.digital.nhs.uk/trud/api/v1/keys/{snomed_apikey}/items/101/releases?latest'\n",
"\n",
"# SNOMED CT UK Drug Extension\n",
"drug_info_url = f'https://isd.digital.nhs.uk/trud/api/v1/keys/{snomed_apikey}/items/105/releases?latest'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Download the SNOMED CT UK Clinical Edition\n",
"response = requests.get(clinical_info_url)\n",
"if response.status_code == 200:\n",
" file_name = response.json()['releases'][0]['id']\n",
" url = response.json()['releases'][0]['archiveFileUrl']\n",
" print('SNOMED information retrieved successfully')\n",
"else:\n",
" print(f'Error: {response.status_code}')\n",
" print(response.json())\n",
"\n",
"# Download the file\n",
"print(f'Downloading {file_name}...')\n",
"response = requests.get(url)\n",
"if response.status_code == 200:\n",
" try:\n",
" with open(f'{file_name}', 'wb') as file:\n",
" file.write(response.content)\n",
" print('Download completed successfully')\n",
"\n",
" except ValueError:\n",
" print(\"Response content is not a valid JSON\")\n",
"else:\n",
" print(f'Failed to download file. Status code: {response.status_code}')\n",
" print(response.text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Download the SNOMED CT UK Drug Extension\n",
"response = requests.get(drug_info_url)\n",
"if response.status_code == 200:\n",
" file_name = response.json()['releases'][0]['id']\n",
" url = response.json()['releases'][0]['archiveFileUrl']\n",
" print('SNOMED information retrieved successfully')\n",
"else:\n",
" print(f'Error: {response.status_code}')\n",
" print(response.json())\n",
"\n",
"# Download the file\n",
"print(f'Downloading {file_name}...')\n",
"response = requests.get(url)\n",
"if response.status_code == 200:\n",
" try:\n",
" with open(f'{file_name}', 'wb') as file:\n",
" file.write(response.content)\n",
" print('Download completed successfully')\n",
"\n",
" except ValueError:\n",
" print(\"Response content is not a valid JSON\")\n",
"else:\n",
" print(f'Failed to download file. Status code: {response.status_code}')\n",
" print(response.text)"
]
},
{
"cell_type": "markdown",
"metadata": {
Expand Down Expand Up @@ -70,7 +190,7 @@
},
"source": [
"### Load the data\n",
"Please see the section: [Access to SNOMED CT release files](#access_to_snomed_ct) for how to retrieve the zipped SNOMED CT release."
"Please see the section: [Access to SNOMED CT release files](##Access-to-SNOMED-CT-files) for how to retrieve the zipped SNOMED CT release."
]
},
{
Expand Down Expand Up @@ -356,46 +476,7 @@
"outputs": [],
"source": [
"# ICD-10\n",
"icd_df = snomed.map_snomed2icd10()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 496
},
"id": "SynBfXCi-Zpb",
"outputId": "f3cde34a-c5f9-428c-874a-01516832f4a1"
},
"outputs": [],
"source": [
"icd_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# drop codes with no mapping\n",
"icd_df = icd_df[icd_df['mapTarget']!='']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sctid2icd10 = icd_df.groupby('referencedComponentId').apply(lambda group: [{'code': row['mapTarget'],\n",
" 'mapGroup': row['mapPriority'],\n",
" 'mapPriority': row['mapPriority'],\n",
" 'mapRule': row['mapRule'],\n",
" 'mapAdvice': row['mapAdvice']} for _, row in group.iterrows()]).to_dict()"
"sctid2icd10 = snomed.map_snomed2icd10()"
]
},
{
Expand Down Expand Up @@ -437,63 +518,17 @@
},
"outputs": [],
"source": [
"opcs_df = snomed.map_snomed2opcs4()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "nINW3byN-dd5"
},
"outputs": [],
"source": [
"opcs_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"opcs_df['refsetId'].unique() # notice how there are two codes?\n",
"# SCTID:'999002271000000101' represents ICD10 codes and SCTID:'1126441000000105' OPCS4\n",
"# Filtering by '999002271000000101' will also show more ICD10 codes. \n",
"# This is because SNOMED UK ext has duplicated information here. For SNOMED UK ext I would use the ICD10 in the refset rather than the internation ed.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Filter for just OPCS4\n",
"opcs_df = opcs_df[opcs_df['refsetId']=='1126441000000105']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sctid2opcs4 = opcs_df.groupby('referencedComponentId').apply(lambda group: [{'code': row['mapTarget'],\n",
" 'mapGroup': row['mapPriority'],\n",
" 'mapPriority': row['mapPriority'],\n",
" 'mapBlock': row['mapBlock'],\n",
" 'mapAdvice': row['mapAdvice']} for _, row in group.iterrows()]).to_dict()"
"sctid2opcs4 = snomed.map_snomed2opcs4()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Optional Enrich with UMLS terms.\n",
"## Optional: Enrich with UMLS terms.\n",
"\n",
"To preprocess UMLS for SNOMED CT, please look [here](/data/snomed/umls_enricher.py). For further details, please refer to the [UMLS folder](/data/umls/ReadMe.md).\n",
"To preprocess UMLS for SNOMED CT, please look [here](umls_enricher.py). For further details, please refer to the [UMLS folder](../umls/ReadMe.md).\n",
"\n",
"For offical UMLS documentation from the NLM:\n",
"Please explore the [UMLS Metathesaurus Vocabulary Documentation](https://www.nlm.nih.gov/research/umls/sourcereleasedocs/index.html?_gl=1*1t5e3g7*_ga*OTQwMzA2NjEyLjE2NjI2NzEyMjU.*_ga_P1FPTH9PL4*MTY2MjY3MTIyNC4xLjEuMTY2MjY3MzE2NS4wLjAuMA..)\n",
Expand Down
Loading

0 comments on commit 39cf6d9

Please sign in to comment.