-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disease-Gene: Self-referential cases #166
base: develop
Are you sure you want to change the base?
Conversation
- Analysis: Self-referential “phenotype in the gene position + Phenotype field without a MIM” + "morbidmap.txt entry not in Phenotype-Gene Relationships table" case
This comment was marked as outdated.
This comment was marked as outdated.
1ed5723
to
5cdd485
Compare
- Update: No longer just analysis. Now filtering these out.
5cdd485
to
5163de7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the code, see comment inline about hard-coding OMIM identifiers
67441ca
to
0e089c6
Compare
- Update: Revert filtering out. Now log as a review.tsv case.
0e089c6
to
b251b3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good! There are some comments/descriptions that will be more clear to our future selves if expanded a bit further and have some examples. I acknowledge these comments are a bit nit picky, but if the docs are being updated they should be clear without a lot of prior knowledge of all the deep review and understanding of the data files that went into identifying these cases.
This review case involves what would be otherwise considered a valid disease-gene relationship, but for the fact that | ||
it quite unusually includes 'digenic' in the label, even though it only had 1 association. OMIM doesn't have a | ||
#### 1. D2G Disease-defining but marked digenic | ||
This review case involves what would be otherwise considered a valid disease-gene (D2G) relationship, but for the fact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joeflack4 can you add a section that includes what a "valid disease-gene (D2G) relationship" is, ie the rules and that 'valid' means it is a gene association that is represented with RO:0004003 'has material basis in germline mutation in'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of commentary scattered throughout the codebase, but not really anything in the README / front page that explains this stuff. So it will entail a new section; I'm not sure what to call it. Maybe an "additional information" or "topics" or "explanation" section with subsection "Disease-Gene associations".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new section in the README is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twhetzel I updated the README. I'm not sure if I'm done yet; check it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to review when you have added everything. Is there some specific feedback you are looking for about the README? The additions I requested should only take ~20 min to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll finish it up and then tag you again. I took is taking a little more time than that. You've been wanting a lot of related documentation for a while, so I compiled it all in a way I felt you and the team would like, phrased in a way that is as clear / least confusing as I can make it (harder to do in this case than other things I typically document).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just re-checked and these README.md
updates are done.
- Addition of new section: Under the hood: Design decisions, etc.
- Has only one sub-section: Gene-Disease Pipeline
- Include's Trish's request: can you add a section that includes what a "valid disease-gene (D2G) relationship"
- Additional content to support that disease-defining explanation, and non-disease defining stuff (copy/pasted from code notes and reformatted)
- Example input/output
- Inverse predicate for disease-defining (
RO:0004003
andRO:0004013
) - Other non-disease-defining gene-disease association predicates.
- Has only one sub-section: Gene-Disease Pipeline
README.md
Outdated
|
||
#### 2. D2G: Disease-defining; self-referential | ||
The unique characteristics of cases of this class are as follows: | ||
- Each case has 2 rows in `morbidmap.txt` and are related. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say how the lines are related
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead I changed "and are related" to "and are part of a pattern".
I think that is more accurate. The next bullets, describing exactly the pattern of "row 1" and "row 2", along with the link to the spreadsheet with examples, should be sufficiently illustrative I think.
#### 2. D2G: Disease-defining; self-referential | ||
The unique characteristics of cases of this class are as follows: | ||
- Each case has 2 rows in `morbidmap.txt` and are related. | ||
- Row 1: One row is a typical, valid, disease-defining entry. For the given phenotype MIM in that row, there are no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include an example of where the phenotype MIM is in these kinds of rows so someone can understand the case without having to read all the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already the link to the google sheet. I think that's probably enough.
However, the formatting of the spreadsheet is not 100% the same as morbidmap.txt
, and it can be hard to interrogate that file sometimes.
So I added this table as an example:
Phenotype | Gene/Locus And Other Related Symbols | MIM Number | Cyto Location |
---|---|---|---|
Small cell cancer of the lung, somatic, 182280 (3) | RB1 | 614041 | 13q14.2 |
Small-cell cancer of lung (2) | SCLC1 | 182280 | 3p23-p21 |
I think this will be sufficient to show how / where the Phenotype appears.
- other rows in `morbidmap.txt` where it appears as a phenotype having an association with another gene. | ||
- In all such cases seen thus far as of 2024/11/18, all of these are cancer cases, and the label ends with "somatic". | ||
- This entry appears in the Phenotype-Gene Relationships table on the MIM's omim.org/entry page. | ||
- Row 2: There is a second row where the phenotype in the first row appears as a gene. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include an example data line so it's clear what is meant as the 'phenotype in the first row appears as a gene', which i think you mean the phenotype mim is in the position where a gene mim should be found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same example just added see comment above should suffice.
- Update: README: general documentation for d2g pipeline
- Updates to README.md docs: (i) changed verbiage 'related' to 'pattern', (ii) added an example from morbidmap.txt. - Updates: Code comments: Added verbiage 'disease-defining'
44b9407
to
d4cb863
Compare
Disease-Gene: Self-referential somatic cancers
Self-referential “phenotype in the gene position + Phenotype field without a MIM” + "morbidmap.txt entry not in Phenotype-Gene Relationships table" case - add to
review.tsv
All current cases: Google Sheet