You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes we see a GFF3 that does not explicitly state where the exons are, e.g.
ctgA example gene 1050 9000 . + . ID=EDEN;Name=EDEN;Note=protein kinase
ctgA example mRNA 1050 9000 . + . ID=EDEN.1;Parent=EDEN;Name=EDEN.1;Note=Eden splice form 1;Index=1
ctgA example mRNA 1050 9000 . + . ID=EDEN.2;Parent=EDEN;Name=EDEN.2;Note=Eden splice form 2;Index=1
ctgA example five_prime_UTR 1050 1200 . + . Parent=EDEN.1
ctgA example five_prime_UTR 1050 1200 . + . Parent=EDEN.2
ctgA example CDS 1201 1500 . + 0 Parent=EDEN.1
ctgA example CDS 1201 1500 . + 0 Parent=EDEN.2
ctgA example mRNA 1300 9000 . + . ID=EDEN.3;Parent=EDEN;Name=EDEN.3;Note=Eden splice form 3;Index=1
ctgA example five_prime_UTR 1300 1500 . + . Parent=EDEN.3
ctgA example CDS 3000 3902 . + 0 Parent=EDEN.1
ctgA example five_prime_UTR 3000 3300 . + . Parent=EDEN.3
ctgA example CDS 3301 3902 . + 0 Parent=EDEN.3
ctgA example CDS 5000 5500 . + 0 Parent=EDEN.1
ctgA example CDS 5000 5500 . + 0 Parent=EDEN.2
ctgA example CDS 5000 5500 . + 1 Parent=EDEN.3
ctgA example CDS 7000 7600 . + 1 Parent=EDEN.3
ctgA example CDS 7000 7608 . + 0 Parent=EDEN.1
ctgA example CDS 7000 7608 . + 0 Parent=EDEN.2
ctgA example three_prime_UTR 7601 9000 . + . Parent=EDEN.3
ctgA example three_prime_UTR 7609 9000 . + . Parent=EDEN.1
ctgA example three_prime_UTR 7609 9000 . + . Parent=EDEN.2
In this case we need to synthesize the exons for our internal representations.
We can use the five_prime_UTR, three_prime_UTR, and CDS lines to figure out where the exons are. If a UTR and a CDS are adjacent, they should be combined into a single exon. Otherwise, each unique CDS location should get an exon with the same location.
This needs to be handles in packages/apollo-shared/src/GFF3/gff3ToAnnotationFeature.ts. We'll probably want to check after processedCDS are determined in that file if there are any exons, and then synthesize them at that point if not.
The text was updated successfully, but these errors were encountered:
The code may be inefficient but the slow down doesn't seem noticeable. These are two replicates loading a small gff with 283 mRNAs:
# Before (just before this branch started):
git checkout 143354232ff73b042aa0f55996b6b94068eeb748
time yarn dev feature import --profile testAdmin ~/Downloads/TGGT1_chrII.gff -d -a ToxoDB-67_TgondiiGT1_Genome.fasta.gz
progress [========================================] 100% | ETA: 0s | 605768/605768
real 0m20.265s
user 0m3.263s
sys 0m0.403s
time yarn dev feature import --profile testAdmin ~/Downloads/TGGT1_chrII.gff -d -a ToxoDB-67_TgondiiGT1_Genome.fasta.gz
progress [========================================] 100% | ETA: 0s | 605768/605768
real 0m18.901s
user 0m2.900s
sys 0m0.371s
Current:
git switch -
Previous HEAD position was 14335423 Make feature type ontology configurable (#472)
Switched to branch 'import_gff3_wo_exons_issue491'
Your branch is ahead of 'origin/import_gff3_wo_exons_issue491' by 2 commits.
(use "git push" to publish your local commits)
time yarn dev feature import --profile testAdmin ~/Downloads/TGGT1_chrII.gff -d -a ToxoDB-67_TgondiiGT1_Genome.fasta.gz
progress [========================================] 100% | ETA: 0s | 605768/605768
real 0m19.276s
user 0m2.675s
sys 0m0.345s
time yarn dev feature import --profile testAdmin ~/Downloads/TGGT1_chrII.gff -d -a ToxoDB-67_TgondiiGT1_Genome.fasta.gz
progress [========================================] 100% | ETA: 0s | 605768/605768
real 0m19.263s
user 0m2.885s
sys 0m0.347s
Sometimes we see a GFF3 that does not explicitly state where the exons are, e.g.
In this case we need to synthesize the exons for our internal representations.
We can use the five_prime_UTR, three_prime_UTR, and CDS lines to figure out where the exons are. If a UTR and a CDS are adjacent, they should be combined into a single exon. Otherwise, each unique CDS location should get an exon with the same location.
This needs to be handles in packages/apollo-shared/src/GFF3/gff3ToAnnotationFeature.ts. We'll probably want to check after processedCDS are determined in that file if there are any exons, and then synthesize them at that point if not.
The text was updated successfully, but these errors were encountered: