Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disease-Gene: Self-referential cases #166

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open

Conversation

joeflack4
Copy link
Contributor

@joeflack4 joeflack4 commented Nov 17, 2024

Disease-Gene: Self-referential somatic cancers
Self-referential “phenotype in the gene position + Phenotype field without a MIM” + "morbidmap.txt entry not in Phenotype-Gene Relationships table" case - add to review.tsv

All current cases: Google Sheet

- Analysis: Self-referential “phenotype in the gene position + Phenotype field without a MIM” + "morbidmap.txt entry not in Phenotype-Gene Relationships table" case
@joeflack4 joeflack4 self-assigned this Nov 17, 2024
@joeflack4 joeflack4 added the analysis Not a feature or update to the core of the repository, but an ad hoc analysis. label Nov 17, 2024
@joeflack4

This comment was marked as outdated.

@joeflack4 joeflack4 closed this Nov 17, 2024
@joeflack4 joeflack4 deleted the self-ref-d2g branch November 17, 2024 23:35
@joeflack4 joeflack4 restored the self-ref-d2g branch November 18, 2024 18:24
@joeflack4 joeflack4 reopened this Nov 18, 2024
@joeflack4 joeflack4 changed the base branch from main to develop November 18, 2024 18:38
- Update: No longer just analysis. Now filtering these out.
@joeflack4 joeflack4 changed the title Temp: Self-referential Disease-Gene analysis Disease-Gene: Self-referential cases Nov 18, 2024
Copy link
Contributor

@twhetzel twhetzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the code, see comment inline about hard-coding OMIM identifiers

omim2obo/main.py Outdated Show resolved Hide resolved
- Update: Revert filtering out. Now log as a review.tsv case.
Copy link
Contributor

@twhetzel twhetzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! There are some comments/descriptions that will be more clear to our future selves if expanded a bit further and have some examples. I acknowledge these comments are a bit nit picky, but if the docs are being updated they should be clear without a lot of prior knowledge of all the deep review and understanding of the data files that went into identifying these cases.

This review case involves what would be otherwise considered a valid disease-gene relationship, but for the fact that
it quite unusually includes 'digenic' in the label, even though it only had 1 association. OMIM doesn't have a
#### 1. D2G Disease-defining but marked digenic
This review case involves what would be otherwise considered a valid disease-gene (D2G) relationship, but for the fact
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joeflack4 can you add a section that includes what a "valid disease-gene (D2G) relationship" is, ie the rules and that 'valid' means it is a gene association that is represented with RO:0004003 'has material basis in germline mutation in'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of commentary scattered throughout the codebase, but not really anything in the README / front page that explains this stuff. So it will entail a new section; I'm not sure what to call it. Maybe an "additional information" or "topics" or "explanation" section with subsection "Disease-Gene associations".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new section in the README is fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twhetzel I updated the README. I'm not sure if I'm done yet; check it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to review when you have added everything. Is there some specific feedback you are looking for about the README? The additions I requested should only take ~20 min to add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll finish it up and then tag you again. I took is taking a little more time than that. You've been wanting a lot of related documentation for a while, so I compiled it all in a way I felt you and the team would like, phrased in a way that is as clear / least confusing as I can make it (harder to do in this case than other things I typically document).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just re-checked and these README.md updates are done.

  • Addition of new section: Under the hood: Design decisions, etc.
    • Has only one sub-section: Gene-Disease Pipeline
      • Include's Trish's request: can you add a section that includes what a "valid disease-gene (D2G) relationship"
      • Additional content to support that disease-defining explanation, and non-disease defining stuff (copy/pasted from code notes and reformatted)
        • Example input/output
        • Inverse predicate for disease-defining (RO:0004003 and RO:0004013)
        • Other non-disease-defining gene-disease association predicates.

README.md Outdated

#### 2. D2G: Disease-defining; self-referential
The unique characteristics of cases of this class are as follows:
- Each case has 2 rows in `morbidmap.txt` and are related.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say how the lines are related

Copy link
Contributor Author

@joeflack4 joeflack4 Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead I changed "and are related" to "and are part of a pattern".

I think that is more accurate. The next bullets, describing exactly the pattern of "row 1" and "row 2", along with the link to the spreadsheet with examples, should be sufficiently illustrative I think.

#### 2. D2G: Disease-defining; self-referential
The unique characteristics of cases of this class are as follows:
- Each case has 2 rows in `morbidmap.txt` and are related.
- Row 1: One row is a typical, valid, disease-defining entry. For the given phenotype MIM in that row, there are no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include an example of where the phenotype MIM is in these kinds of rows so someone can understand the case without having to read all the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already the link to the google sheet. I think that's probably enough.

However, the formatting of the spreadsheet is not 100% the same as morbidmap.txt, and it can be hard to interrogate that file sometimes.

So I added this table as an example:

Phenotype Gene/Locus And Other Related Symbols MIM Number Cyto Location
Small cell cancer of the lung, somatic, 182280 (3) RB1 614041 13q14.2
Small-cell cancer of lung (2) SCLC1 182280 3p23-p21

I think this will be sufficient to show how / where the Phenotype appears.

- other rows in `morbidmap.txt` where it appears as a phenotype having an association with another gene.
- In all such cases seen thus far as of 2024/11/18, all of these are cancer cases, and the label ends with "somatic".
- This entry appears in the Phenotype-Gene Relationships table on the MIM's omim.org/entry page.
- Row 2: There is a second row where the phenotype in the first row appears as a gene.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include an example data line so it's clear what is meant as the 'phenotype in the first row appears as a gene', which i think you mean the phenotype mim is in the position where a gene mim should be found

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same example just added see comment above should suffice.

README.md Show resolved Hide resolved
omim2obo/main.py Show resolved Hide resolved
@twhetzel twhetzel self-requested a review November 19, 2024 20:11
- Update: README: general documentation for d2g pipeline
- Updates to README.md docs: (i) changed verbiage 'related' to 'pattern', (ii) added an example from morbidmap.txt.
- Updates: Code comments: Added verbiage 'disease-defining'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Not a feature or update to the core of the repository, but an ad hoc analysis.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants