Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new PHAF columns #1066

Closed
kimrutherford opened this issue Feb 28, 2023 · 12 comments
Closed

Add new PHAF columns #1066

kimrutherford opened this issue Feb 28, 2023 · 12 comments
Assignees

Comments

@kimrutherford
Copy link
Member

kimrutherford commented Feb 28, 2023

  • Ploidy - needs to be mandatory (add to all files
  • allele variant ( add url for genotype format variant syntax)
  • Use Condition column
    • Temperature (degrees C)
    • Chemical or agent (concentration units)
  • Chemical or agent dose
  • Phenotype score / Phenotype score units
    • use Severity column
      • currently: FYPO_EXT:0000003 etc.
      • add:
        • fitness_log2(0.123)
        • normalised_difference(-0.4234)
        • store as properties in Chado (feature_cvtermprop)
  • FYECO -> Glucose present (CHEBI)
  • CHEBI,concentration
  • FYECO (UV) -> FYECO:X(mJ)
@kimrutherford kimrutherford self-assigned this Feb 28, 2023
@manulera
Copy link

Related to pombase/curation#3465

@manulera
Copy link

@kimrutherford I am almost done with the file, would just have to figure out those exceptional systematic ids and what the alleles should be called. Going swimming now but will check tomorrow 🏊

https://github.com/manulera/phenomics_paper_HTP/blob/master/results/pombase_dataset.tsv

@kimrutherford
Copy link
Member Author

Thanks Manu. I've changed the loading code to handle the changes to the severity and condition columns.

kimrutherford added a commit to pombase/pombase-legacy that referenced this issue Mar 7, 2023
kimrutherford added a commit to pombase/pombase-legacy that referenced this issue Mar 7, 2023
We'll be store it as a feature_cvtermprop instead.

Refs pombase/pombase-chado#1066
kimrutherford added a commit that referenced this issue Mar 7, 2023
We store in the feature_cvtermprop table.

Refs #1066
@kimrutherford
Copy link
Member Author

I've changed the PHAF loader to store the allele_variant and the phenotype score in Chado. I'm still trying to think of a good way to store the new condition details.

We need to think about how and where to display the new information on the website.

@kimrutherford
Copy link
Member Author

Ploidy - needs to be mandatory

I've changed my mind on that. It can continue to be optional in the cases where there is no allele variant.

@manulera
Copy link

manulera commented Mar 9, 2023

I've changed my mind on that.

I guess the assumption is haploid unless otherwise stated, but why does this have to do with the allele variant present or absent?

@kimrutherford
Copy link
Member Author

I guess the assumption is haploid unless otherwise stated, but why does this have to do with the allele variant present or absent?

I was thinking that we would need to put something in the "Ploidy" column only if we have an allele variant column. If there is no allele variant column, we won't need a "Ploidy" column.

@manulera
Copy link

manulera commented Mar 9, 2023

I am looking at the PHAF headers and I think I see what you mean, but I don't think this problem is restricted to the allele variant column. The headers are clearly meant to describe either haploids or homozygous diploids, but it is not clear how you would describe a heterozygous diploid. Separating allele names, expression levels and all with a spacer? Two lines for the same phenotype, but then what would be the unique identifier of the pair?

You can call me on skype if you want to quickly discuss this.

@kimrutherford
Copy link
Member Author

but it is not clear how you would describe a heterozygous diploid.

There isn't a way to store these in the PHAF file. The "Ploidy" column is a hack to allow loading the commonest diploids ("homozygous diploid"). That format doesn't support anything more complicated.

We have plans for a new PHAF format to handle this problem (see #496) but we haven't implemented it yet. Stable allele identifiers will make this problem easier.

@kimrutherford
Copy link
Member Author

Hi @manulera

Sorry to come back to this after 2 weeks.

In the new PHAF file (pombe-embl/external_data/phaf_files/chado_load/htp_phafs/PMID_34984977_phaf.tsv), some of the FYECO conditions have empty brackets, like: FYECO:0000211(). Does that need fixing?

Also should we include the unit ("C") after temperatures in conditions? Or is the unit obvious in that case?

Here an example line that has examples of both questions:

Gene systematic ID FYPO ID Allele description Expression Parental strain Background strain name Background genotype description Gene name Allele name Allele synonym Allele type Evidence Condition Penetrance Severity Extension Reference taxon Date Ploidy Allele variant
SPNCRNA.1130 FYPO:0000725 deletion null 972 h- deletion ECO:0001563 FYECO:0000138,FYECO:0000211(),FYECO:0000334,FYECO:0000005(32) fitness_log2(0.11) PMID:34984977 4896 2023-02-23 haploid III:g.319210_319876del

Thanks!

@manulera
Copy link

Hi @kimrutherford, fixed here and updated on revision 8824 of svn

manulera/phenomics_paper_HTP@ae5baa3

kimrutherford added a commit that referenced this issue Mar 23, 2023
The cases like: FYECO:0000410(23.2ug/ml) and FYECO:0000005(32 C)

Refs #1066
@kimrutherford
Copy link
Member Author

kimrutherford commented Mar 23, 2023

Hi @kimrutherford, fixed here and updated on revision 8824 of svn

Thanks! I've finished the changes needed to load the conditions with temperatures and quantities into Chado. The new PHAF file loads perfectly now.

I think this in done now.

We now need to think about how to show this new information:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants