Skip to content

Commit

Permalink
Disease-Gene: Self-referential somatic cancers
Browse files Browse the repository at this point in the history
- Update: Revert filtering out. Now log as a review.tsv case.
  • Loading branch information
joeflack4 committed Nov 19, 2024
1 parent 5163de7 commit 0e089c6
Show file tree
Hide file tree
Showing 3 changed files with 59 additions and 40 deletions.
26 changes: 21 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,15 +105,31 @@ always in sync, and that one or the other may be slightly more up-to or out-of d

### `review.tsv`
Columns:
- `classCode`: integer: ID of review case class
- `classShortName`: string (camelCase): describing the review case class
- `classCode`: integer
- `classLabel`: string
- `value`: any: Some form of data to review
- `comment`: string (optional)

#### 1. `causalD2gButMarkedDigenic`
This review case involves what would be otherwise considered a valid disease-gene relationship, but for the fact that
it quite unusually includes 'digenic' in the label, even though it only had 1 association. OMIM doesn't have a
#### 1. D2G Disease-defining but marked digenic
This review case involves what would be otherwise considered a valid disease-gene (D2G) relationship, but for the fact
that it quite unusually includes 'digenic' in the label, even though it only had 1 association. OMIM doesn't have a
guaranatee on the data quality of its disease-gene associations marked 'digenic', so for any of these entries, it could
be the case that either (a) it is not 'digenic'; OMIM should remove that from the label, and Mondo can make an explicit
exception to add the relationship, or could otherwise wait until OMIM fixes the issue and it will automatically be
added, or (b) it is in fact 'digenic', and OMIM should add the missing 2nd gene association.

#### 2. D2G: Disease-defining; self-referential
The unique characteristics of cases of this class are as follows:
- Each case has 2 rows in `morbidmap.txt` and are related.
- Row 1: One row is a typical, valid, disease-defining entry. For the given phenotype MIM in that row, there are no
- other rows in `morbidmap.txt` where it appears as a phenotype having an association with another gene.
- In all such cases seen thus far as of 2024/11/18, all of these are cancer cases, and the label ends with "somatic".
- This entry appears in the Phenotype-Gene Relationships table on the MIM's omim.org/entry page.
- Row 2: There is a second row where the phenotype in the first row appears as a gene.
- For this row, there is no MIM in the phenotype field.
- This row does not appear in the Gene-Phenotype Relationships table on the MIM's omim.org/entry page.
- This row is self-referential. The label in the Phenotype field is one of the titles of the MIM in the Gene field.

There is a spreadsheet which collates all known cases as of 2024/11/18: [google sheet](
https://docs.google.com/spreadsheets/d/1hKSp2dyKye6y_20NK2HwLsaKNzWfGCMJMP52lKrkHtU/). The MIMs of the known cases are:
159595, 182280, 607107, and 615830.
62 changes: 38 additions & 24 deletions omim2obo/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
from omim2obo.config import REVIEW_CASES_PATH, ROOT_DIR, GLOBAL_TERMS, ReviewCase
from omim2obo.namespaces import *
from omim2obo.parsers.omim_entry_parser import get_alt_labels, get_pubs, get_mapped_ids, LabelCleaner, \
has_self_ref_assocs
get_self_ref_assocs
from omim2obo.parsers.omim_txt_parser import * # todo: change to specific imports


Expand Down Expand Up @@ -142,7 +142,7 @@ def get_graph():


# Main
def omim2obo(use_cache: bool = False):
def omim2obo(use_cache: bool = True):
"""Run program"""
graph = OmimGraph.get_graph()
download_files_tf: bool = not use_cache
Expand Down Expand Up @@ -304,40 +304,61 @@ def omim2obo(use_cache: bool = False):
phenotype_genes[p_mim].append({
'gene_id': gene_mim, 'phenotype_label': p_lab, 'mapping_key': p_map_key, 'mapping_label': p_map_lab})

self_ref_case = 0
# - Add relations (subclass restrictions)
for p_mim, assocs in phenotype_genes.items():
for assoc in assocs:
gene_mim, p_lab, p_map_key, p_map_lab = assoc['gene_id'], assoc['phenotype_label'], \
assoc['mapping_key'], assoc['mapping_label']
evidence = f'Evidence: ({p_map_key}) {p_map_lab}'

# General skippable cases
# Skip: No phenotype or unknown defect
# - not p_mim: Skip because not an association to another MIM (Provenance:
# https://github.com/monarch-initiative/omim/issues/78)
# - p_map_key == '1': Skip because association w/ unknown defect (Provenance:
# https://github.com/monarch-initiative/omim/issues/79#issuecomment-1319408780)
if not p_mim or p_map_key == '1':
continue

# Gene->Disease non-causal relationships
# Add restrictions: Gene->Disease non-causal relationships
# - RO:0003302 docs: see MORBIDMAP_PHENOTYPE_MAPPING_KEY_PREDICATES
if p_map_key != '3': # 3 = 'causal'. Handled separately below.
g2d_pred = MORBIDMAP_PHENOTYPE_MAPPING_KEY_PREDICATES[p_map_key] if len(assocs) == 1 else RO['0003302']
add_subclassof_restriction_with_evidence(graph, g2d_pred, OMIM[p_mim], OMIM[gene_mim], evidence)

# Disease->Gene & Gene->Disease: Causal relationships
# - Skip non-causal cases
# - 3: The molecular basis for the disorder is known; a mutation has been found in the gene.
# Skip non-causal cases
if len(assocs) > 1 or p_map_key != '3' or not p2g_is_definitive(p_lab):
continue
# - Digenic: Should technically be none marked 'digenic' if only 1 association, but there are.

# Log review cases
# - Digenic: Should technically be none marked 'digenic' if only 1 association, but there are.
if 'digenic' in p_lab.lower():
# noinspection PyTypeChecker typecheck_fail_old_Python
REVIEW_CASES.append({
"classCode": 1,
"classShortName": "causalD2gButMarkedDigenic",
"value": f"OMIM:{p_mim}: {p_lab} (Gene: OMIM:{gene_mim})",
"classShortName": "D2G: Disease-defining but marked digenic",
"value": f"(Phenotype: {p_mim} {p_lab}) (Gene: {gene_mim})",
})
# -Self-referential cases
self_ref_assocs: List[Dict] = get_self_ref_assocs(p_mim, gene_phenotypes)
if self_ref_assocs:
self_ref_case += 1
REVIEW_CASES.append({
"classCode": 2,
"classShortName": "D2G: Disease-defining; self-referential",
"value":
f"{self_ref_case}: (Phenotype: {p_mim} {p_lab}), (Map key: {p_map_key}), (Gene: {gene_mim})",
})
for self_ref_assoc in self_ref_assocs:
# noinspection PyTypeChecker typecheck_fail_old_Python
REVIEW_CASES.append({
"classCode": 2,
"classShortName": "D2G: Disease-defining; self-referential",
"value": f"{self_ref_case}: (Phenotype: {self_ref_assoc['phenotype_label']}), (Map key: "
f"{self_ref_assoc['phenotype_mapping_info_key']}), (Gene: OMIM:{p_mim})",
})
# - Unexpected non-phenotype MIM types
# todo: these need to be in review.tsv as well
p_mim_type: str = omim_types[p_mim] # Allowable: PHENOTYPE, HERITABLE_PHENOTYPIC_MARKER (#, %)
mim_type_err = f"Warning: Unexpected MIM type {p_mim_type} for Phenotype {p_mim} when parsing phenotype-" \
f"disease relationships. Skipping."
Expand All @@ -346,21 +367,13 @@ def omim2obo(use_cache: bool = False):
if p_mim_type == 'GENE': # *
print(mim_type_err, file=sys.stderr) # OMIM recognized as data quality issue. Fixed 2024/11. Failsafe.

# Self-referential special cases.
# Known cases: https://docs.google.com/spreadsheets/d/1hKSp2dyKye6y_20NK2HwLsaKNzWfGCMJMP52lKrkHtU/
if has_self_ref_assocs(p_mim, gene_phenotypes):
if p_mim not in ('159595', '182280', '607107', '615830'): # previously known cases
# todo: Add to review.tsv?
print(f'Unexpected new self-referential case: OMIM:{p_mim}: {p_lab} (Gene: OMIM:{gene_mim})',
file=sys.stderr)
continue

# Disease --(RO:0004003 'has material basis in germline mutation in')--> Gene
# https://www.ebi.ac.uk/ols4/ontologies/ro/properties?iri=http://purl.obolibrary.org/obo/RO_0004003
# Add restrictions: causal germline mutation
# - Disease --(RO:0004003 'has material basis in germline mutation in')--> Gene
# https://www.ebi.ac.uk/ols4/ontologies/ro/properties?iri=http://purl.obolibrary.org/obo/RO_0004003
add_subclassof_restriction_with_evidence(
graph, RO['0004003'], OMIM[gene_mim], OMIM[p_mim], evidence)
# Gene --(RO:0004013 'is causal germline mutation in')--> Disease
# https://www.ebi.ac.uk/ols4/ontologies/ro/properties?iri=http://purl.obolibrary.org/obo/RO_0004013
# - Gene --(RO:0004013 'is causal germline mutation in')--> Disease
# https://www.ebi.ac.uk/ols4/ontologies/ro/properties?iri=http://purl.obolibrary.org/obo/RO_0004013
add_subclassof_restriction_with_evidence(
graph, RO['0004013'], OMIM[p_mim], OMIM[gene_mim], evidence)

Expand Down Expand Up @@ -389,7 +402,8 @@ def omim2obo(use_cache: bool = False):
for orphanet_id in orphanet_ids:
graph.add((OMIM[mim_number], SKOS.exactMatch, ORPHANET[orphanet_id]))

review_df = pd.DataFrame(REVIEW_CASES) # todo: ensure comment field exists even when no row uses
# todo: ensure comment field exists even when no row uses
review_df = pd.DataFrame(REVIEW_CASES).sort_values(by=['classCode'])
review_df.to_csv(REVIEW_CASES_PATH, index=False, sep='\t')
with open(OUTPATH, 'w') as f:
f.write(graph.serialize(format='turtle'))
Expand Down
11 changes: 0 additions & 11 deletions omim2obo/parsers/omim_entry_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -395,14 +395,3 @@ def get_self_ref_assocs(phenotype_mim: str, gene_phenotypes: Dict[str, Dict]) ->
if not _assoc['phenotype_mim_number']:
_self_ref_assocs.append(_assoc)
return _self_ref_assocs


def has_self_ref_assocs(phenotype_mim: str, gene_phenotypes: Dict[str, Dict]) -> bool:
"""Check whether has self referential associations
Several anomalies with these:
Self-referential “phenotype in the gene position + Phenotype field without a MIM” + "morbidmap.txt entry not in
Phenotype-Gene Relationships table" on website
Also all marked as 'somatic'.
"""
return len(get_self_ref_assocs(phenotype_mim, gene_phenotypes)) > 0

0 comments on commit 0e089c6

Please sign in to comment.