Medical Named Entity Recognition Python library by Fast Data Science

🌐 fastdatascience.com

Medical Named Entity Recognition Python library by Fast Data Science

Finds disease names

⚕️ Medical Named Entity Recognition

Developed by Fast Data Science, https://fastdatascience.com

Source code at https://github.com/fastdatascience/medical_named_entity_recognition

This library is in Beta.

😊 Who worked on the Medical Named Entity Recognition library?

The tool was developed by:

Thomas Wood (Fast Data Science)

💻Installing Medical Named Entity Recognition Python package

You can install Medical Named Entity Recognition from PyPI.

pip install medical-named-entity-recognition

If you get an error installing Medical Named Entity Recognition, try making a new Python environment in Conda (conda create -n test-env; conda activate test-env) or Venv (python -m testenv; source testenv/bin/activate / testenv\Scripts\activate) and then installing the library.

💡Usage examples

You must first tokenise your input text using a tokeniser of your choice (NLTK, spaCy, etc).

You pass a list of strings to the find_diseases function.

Example 1

import re
re_tokenise = re.compile(r"((?:\w|'|’)+)")
from medical_named_entity_recognition import find_diseases
tokens = re_tokenise.findall("cystic fibrosis")
find_diseases(tokens)

outputs a list of tuples.

[({'mesh_id': 'D019005',
   'mesh_tree': ['C16.320.190', 'C16.614.213', 'C08.381.187', 'C06.689.202'],
   'name': 'Cystic Fibrosis',
   'synonyms': ['cystic fibrosis',
    'mucoviscidosis',
    'pancreas fibrocystic diseases',
    'pancreas fibrocystic disease',
    'cystic fibrosis, pulmonary',
    'cystic fibrosis, pancreatic',
    'pancreatic cystic fibrosis',
    'fibrosis, cystic',
    'pulmonary cystic fibrosis',
    'fibrocystic disease of pancreas',
    'cystic fibrosis of pancreas'],
   'is_brand': False,
   'match_type': 'exact',
   'matching_string': 'cystic fibrosis'},
  0,
  1)]

Interested in other kinds of named entity recognition (NER)? 💊 Drug names (medicines), pharma, 💸Finances, 🎩company names, 🌎countries, 🗺️locations, proteins, 🧬genes, 🧪molecules?

If your NER problem is common across industries and likely to have been seen before, there may be an off-the-shelf NER tool for your purposes, such as our Country Named Entity Recognition Python library or our Drug Named Entity Recognition library. Dictionary-based named entity recognition is not always the solution, as sometimes the total set of entities is an open set and can't be listed (e.g. personal names), so sometimes a bespoke trained NER model is the answer. For tasks like finding email addresses or phone numbers, regular expressions (simple rules) are sufficient for the job.

If your named entity recognition or named entity linking problem is very niche and unusual, and a product exists for that problem, that product is likely to only solve your problem 80% of the way, and you will have more work trying to fix the final mile than if you had done the whole thing manually. Please contact Fast Data Science and we'll be glad to discuss. For example, we've worked on a consultancy engagement to find molecule names in papers, and match author names to customers where the goal was to trace molecule samples ordered from a pharma company and identify when the samples resulted in a publication. For this case, there was no off-the-shelf library that we could use.

For a problem like identifying country names in English, which is a closed set with well-known variants and aliases, an off-the-shelf library is usually available. You may wish to try our Country Named Entity Recognition library, also open-source and under MIT license.

For identifying a set of molecules manufactured by a particular company, this is the kind of task more suited to a consulting engagement.

Requirements

Python 3.9 and above

✉️Who to contact?

You can contact Thomas Wood or the Fast Data Science team at https://fastdatascience.com/.

Contributing to the Medical Named Entity Recognition library

If you'd like to contribute to this project, you can contact us at https://fastdatascience.com/ or make a pull request on our Github repository. You can also raise an issue.

Developing the Medical Named Entity Recognition library

Automated tests

Test code is in tests/ folder using unittest.

The testing tool tox is used in the automation with GitHub Actions CI/CD.

Use tox locally

Install tox and run it:

pip install tox
tox

In our configuration, tox runs a check of source distribution using check-manifest (which requires your repo to be git-initialized (git init) and added (git add .) at least), setuptools's check, and unit tests using pytest. You don't need to install check-manifest and pytest though, tox will install them in a separate environment.

The automated tests are run against several Python versions, but on your machine, you might be using only one version of Python, if that is Python 3.9, then run:

tox -e py39

Thanks to GitHub Actions' automated process, you don't need to generate distribution files locally. But if you insist, click to read the "Generate distribution files" section.

🤖 Continuous integration/deployment to PyPI

This package is based on the template https://pypi.org/project/example-pypi-package/

This package

uses GitHub Actions for both testing and publishing
is tested when pushing master or main branch, and is published when create a release
includes test files in the source distribution
uses setup.cfg for version single-sourcing (setuptools 46.4.0+)

🧍Re-releasing the package manually

The code to re-release Medical Named Entity Recognition on PyPI is as follows:

source activate py311
pip install twine
rm -rf dist
python setup.py sdist
twine upload dist/*

😊 Who worked on the Medical Named Entity Recognition library?

The tool was developed by:

Thomas Wood (Fast Data Science)

🤝Compatibility with other natural language processing libraries

The Medical Named Entity Recognition library is independent of other NLP tools and has no dependencies. You don't need any advanced system requirements and the tool is lightweight. However, it combines well with other libraries such as spaCy or the Natural Language Toolkit (NLTK).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
harvesting_data_from_source		harvesting_data_from_source
src/medical_named_entity_recognition		src/medical_named_entity_recognition
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Named Entity Recognition Python library by Fast Data Science

Finds disease names

⚕️ Medical Named Entity Recognition

😊 Who worked on the Medical Named Entity Recognition library?

💻Installing Medical Named Entity Recognition Python package

💡Usage examples

Interested in other kinds of named entity recognition (NER)? 💊 Drug names (medicines), pharma, 💸Finances, 🎩company names, 🌎countries, 🗺️locations, proteins, 🧬genes, 🧪molecules?

Requirements

✉️Who to contact?

Contributing to the Medical Named Entity Recognition library

Developing the Medical Named Entity Recognition library

Automated tests

Use tox locally

🤖 Continuous integration/deployment to PyPI

🧍Re-releasing the package manually

😊 Who worked on the Medical Named Entity Recognition library?

🤝Compatibility with other natural language processing libraries

📜License of Medical Named Entity Recognition library

About

Releases

Packages

Languages

License

fastdatascience/medical_named_entity_recognition

Folders and files

Latest commit

History

Repository files navigation

Medical Named Entity Recognition Python library by Fast Data Science

Finds disease names

⚕️ Medical Named Entity Recognition

😊 Who worked on the Medical Named Entity Recognition library?

💻Installing Medical Named Entity Recognition Python package

💡Usage examples

Interested in other kinds of named entity recognition (NER)? 💊 Drug names (medicines), pharma, 💸Finances, 🎩company names, 🌎countries, 🗺️locations, proteins, 🧬genes, 🧪molecules?

Requirements

✉️Who to contact?

Contributing to the Medical Named Entity Recognition library

Developing the Medical Named Entity Recognition library

Automated tests

Use tox locally

🤖 Continuous integration/deployment to PyPI

🧍Re-releasing the package manually

😊 Who worked on the Medical Named Entity Recognition library?

🤝Compatibility with other natural language processing libraries

📜License of Medical Named Entity Recognition library

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages