Skip to content

Commit

Permalink
Update docs and update to newer pyo3 version
Browse files Browse the repository at this point in the history
  • Loading branch information
anergictcell committed Jun 25, 2024
1 parent 3d405d9 commit b0a1a17
Show file tree
Hide file tree
Showing 16 changed files with 553 additions and 143 deletions.
4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "hpo3"
version = "1.3.0"
version = "1.3.1"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
Expand All @@ -9,7 +9,7 @@ name = "pyhpo"
crate-type = ["cdylib"]

[dependencies]
pyo3 = { version = "0.20.3", features = ["extension-module"] }
pyo3 = { version = "0.21.2", features = ["extension-module"] }
hpo = "0.10.1"
rayon = "1.9.0"
once_cell = "1.19"
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ HP:0002751 | Kyphoscoliosis
"""

# Show the categories a term belongs to
for term in Ontology[10049].categories:
for term in Ontology.hpo(10049).categories:
print(term)
"""
HP:0033127 | Abnormality of the musculoskeletal system
Expand Down
313 changes: 313 additions & 0 deletions docs/examples.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,313 @@
Working with HPOTerms
=====================

.. code-block:: python
from pyhpo import Ontology
# initilize the Ontology
Ontology()
# Retrieve a term e.g. via its HPO-ID
term = Ontology.get_hpo_object('Scoliosis')
print(term)
#> HP:0002650 | Scoliosis
# Get information content from Term <--> Omim associations
term.information_content['omim']
#> 2.29
# Show how many genes are associated to the term
# (Note that this includes indirect associations, associations
# from children terms to genes.)
len(term.genes)
#> 1094
# Show how many Omim Diseases are associated to the term
# (Note that this includes indirect associations, associations
# from children terms to diseases.)
len(term.omim_diseases)
#> 844
# Get a list of all parent terms
for p in term.parents:
print(p)
#> HP:0010674 | Abnormality of the curvature of the vertebral column
# Get a list of all children terms
for p in term.children:
print(p)
"""
HP:0002944 | Thoracolumbar scoliosis
HP:0008458 | Progressive congenital scoliosis
HP:0100884 | Compensatory scoliosis
HP:0002944 | Thoracolumbar scoliosis
HP:0002751 | Kyphoscoliosis
"""
Phenotype similarity
====================

Calculate the phenotype similarity between two patients

.. code-block:: python
from pyhpo import Ontology, HPOSet
# initilize the Ontology
Ontology()
# Declare the clinical information of the patients
patient_1 = HPOSet.from_queries([
'HP:0002943',
'HP:0008458',
'HP:0100884',
'HP:0002944',
'HP:0002751'
])
patient_2 = HPOSet.from_queries([
'HP:0002650',
'HP:0010674',
'HP:0000925',
'HP:0009121'
])
# and compare their similarity
patient_1.similarity(patient_2)
#> 0.7594183905785477
# use other similarity method
patient_1.similarity(patient_2, method="resnik")
#> 0.2345
Clustering of patients
======================
The :class:pyhpo.stats.Linkage` class allows linkage analysis and clustering of patients based on their similarity.

In this example, we're clustering 25 genes based on their associated phenotypes.

.. code-block:: python
import pyhpo
from pyhpo import Ontology
# needed only for plotting the dendogram
import scipy
import matplotlib
%matplotlib inline
Ontology()
phenotypes = [] # List[HPOSet]
names = [] # List[str]
for gene in Ontology.genes[:25]:
# It's best to calculate the linkage only based on phenotypes and exclude modifier terms
phenotypes.append(gene.hpo_set().remove_modifier())
names.append(gene.name)
# Cluster the diseases using default settings
lnk = pyhpo.stats.linkage(phenotypes)
# Print the dendogram using scipy and matplotlib
plt = scipy.cluster.hierarchy.dendrogram(lnk, labels=names)
.. figure:: gene_dendogram.png
:alt: Dendogram of 25 random genes

Example dendogram of 25 random genes


Gene-association enrichment
===========================

.. code-block:: python
from pyhpo import stats, Ontology, HPOSet
# initilize the Ontology
Ontology()
# Declare the clinical information of the patients
patient_1 = HPOSet.from_queries([
'HP:0002943',
'HP:0008458',
'HP:0100884',
'HP:0002944',
'HP:0002751'
])
# Calculate the enrichment of genes in an HPOSet
gene_model = stats.EnrichmentModel('gene')
genes = gene_model.enrichment(method='hypergeom', hposet=patient_1)
print(genes[0])
# >> {
# >> 'enrichment': 5.453829934109905e-05,
# >> 'fold': 33.67884615384615,
# >> 'count': 3,
# >> 'item': <Gene (PLOD1)>
# >> }
Multi-threading features
========================

If you have lots of patients or other sets of terms and want to batchwise operations,
you can benefit from **hpo3** multiprocessing features


Calculate the pairwise similarity of the HPOSets from all genes
---------------------------------------------------------------

.. code-block:: python
import itertools
from pyhpo import Ontology, HPOSet, helper
Ontology()
gene_sets = [g.hpo_set() for g in Ontology.genes]
# Create a list of tuples.
# Each tuple contains the two HPOSets to compare
# [
# (HPOSet, HPOSet),
# (HPOSet, HPOSet),
# (HPOSet, HPOSet)
# ]
gene_set_combinations = [
(a[0], a[1]) for a in itertools.combinations(gene_sets, 2)
]
similarities = helper.batch_set_similarity(
gene_set_combinations[0:1000], # only calculating for for 1000 comparisons to save time
kind="omim",
method="graphic",
combine="funSimAvg"
)
# >> [
# >> 0.40744566917419434,
# >> 0.1365184634923935,
# >> ...
# >> ]
Calculate the similarity of of a patient's HPO term to all diseases
-------------------------------------------------------------------

.. code-block:: python
import itertools
from pyhpo import Ontology, HPOSet, helper
Ontology()
patient_1 = HPOSet.from_queries([
'HP:0002943',
'HP:0008458',
'HP:0100884',
'HP:0002944',
'HP:0002751'
])
# casting the gene set to a list to main order for later lookups
genes = list(Ontology.genes)
# Create a list of tuples.
# Each tuple contains the patients HPOSet and the HPOSet of a gene
# [
# (HPOSet_Patient, HPOSet_GeneA),
# (HPOSet_Patient, HPOSet_GeneB),
# (HPOSet_Patient, HPOSet_GeneC)
# ]
comparisons = [(patient_1, g.hpo_set()) for g in genes]
similarities = helper.batch_set_similarity(
comparisons,
kind="omim",
method="graphic",
combine="funSimAvg"
)
# Get most similar gene
top_score = max(similarities)
genes[similarities.index(top_score)]
# >> <Gene (POP1)>
Calculate the disease enrichment for large list of patients
-----------------------------------------------------------
If you have a large list of patients and want to find out which diseases are enriched in each of
them, use the :func:`pyhpo.helper.batch_omim_disease_enrichment` or
:func:`pyhpo.helper.batch_orpha_disease_enrichment` methods.
You can analyze hundreds of patients in seconds.


.. note::

For the below example we did not use the phenotypes of hundreds of patients.
Instead we mimic the type of data by converting genes to HPOSets.
Instead of ``gene_sets``, you could also use a list of HPOSet with patient phenotypes

.. code-block:: python
import itertools
from pyhpo import Ontology, helper
Ontology()
# casting the gene set to a list to main order for later lookups
genes = list(Ontology.genes)[:100] # using only 100 genes in this example
gene_sets = [g.hpo_set() for g in genes]
enrichments = helper.batch_omim_disease_enrichment(gene_sets)
print(f"The most enriched disease for {genes[0]} is {enrichments[0][0]}")
# >> The most enriched disease for 730 | C7 is {
# >> 'enrichment': 3.6762699175625894e-42,
# >> 'fold': 972.9444444444443,
# >> 'count': 13,
# >> 'item': <OmimDisease (610102)>
# >> }
Calculate the gene enrichment for large list of patients
--------------------------------------------------------
If you have a large list of patients and want to find out which genes are enriched in each of
them, use the :func:`pyhpo.helper.batch_gene_enrichment` method.
You can analyze hundreds of patients in seconds.

.. note::

For the below example we did not use the phenotypes of hundreds of patients.
Instead we mimic the type of data by converting diseases to HPOSets.
Instead of ``disease_sets``, you could also use a list of HPOSet with patient phenotypes

.. code-block:: python
import itertools
from pyhpo import Ontology, helper
Ontology()
# casting the gene set to a list to main order for later lookups
diseases = list(Ontology.omim_diseases)[:100] # using only 100 diseases in this example
disease_sets = [d.hpo_set() for d in diseases]
enrichments = helper.batch_gene_enrichment(disease_sets)
print(f"The most enriched gene for {diseases[0]} is {enrichments[0][0]}")
# >> The most enriched gene for 619510 | Immunodeficiency 85 and autoimmunity is {
# >> 'enrichment': 7.207370728788139e-45,
# >> 'fold': 66.0867924528302,
# >> 'count': 24,
# >> 'item': <Gene (TOM1)>
# >> }
Binary file added docs/gene_dendogram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 4 additions & 7 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
Getting started
===============

Installation
------------
============
hpo3 is provided as binary wheels for most platforms on PyPI, so in most cases you can just run

.. code-block:: bash
Expand All @@ -13,7 +10,7 @@ hpo3 is provided as binary wheels for most platforms on PyPI, so in most cases y


Initializing the Ontology
~~~~~~~~~~~~~~~~~~~~~~~~~
=========================

hpo3 ships with a prebuilt HPO Ontology by default, so you can start right away.

Expand All @@ -27,14 +24,14 @@ hpo3 ships with a prebuilt HPO Ontology by default, so you can start right away.
print(term.name)
# get a single term based on their HPO ID
term = Ontology[118]
term = Ontology.hpo(118)
# or by using the full Term ID
term = Ontology.get_hpo_object("HP:0000118")
Updating the Ontology
~~~~~~~~~~~~~~~~~~~~~
=====================

While I try to keep the HPO ontology version updated, it might become outdated at some point. You can always check the used version:

Expand Down
2 changes: 2 additions & 0 deletions docs/hposet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ Instantiation
.. autofunction:: pyhpo.HPOSet.from_serialized
.. autofunction:: pyhpo.HPOSet.from_gene
.. autofunction:: pyhpo.HPOSet.from_disease
.. autofunction:: pyhpo.HPOSet.from_omim_disease
.. autofunction:: pyhpo.HPOSet.from_orpha_disease


Instance methods
Expand Down
Loading

0 comments on commit b0a1a17

Please sign in to comment.