-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fusion-making .py script does not take into account gene orientation #26
Comments
Can you please check and fix this issue? |
Hi, just to add to this, I encountered one more bug in the fusion list making script: it picks up the wrong gene if the gene name is contained in some other gene name (like RET and RETREG1) and the relevant transcript is not supplied. Example:
and if I supply RET to current script, I'll get this:
I suspect this may be an issue for transcripts as well. I invented a quick fix, but someone more keen with regular expressions can just check if gene[0] or gene[1] plus tab symbol is in the line:
|
Thank you very much for reporting bugs and providing solutions! I have fixed the two bugs you mentioned as the way you suggested. |
Just found out that I did not fully fix the strand issue, should have added strand to the identification of the longest transcript as well:
Now should be fully correct! Happy to help! |
You should submit another PR? |
@nvolkovaGEL You are perfectly right, I should have more test before made commit. |
Merged. |
Sorry, should have just made a pull request, but couldn't get my locked local machine to synchronize properly with github. Thanks, please feel free to close this ticket! |
Thanks for creating this tool! I have noticed that the
scripts/make_fusion_genes.py
reports wrong order of exons for genes in '-' orientation, which results in considering some fusions as untranscribed and showing wrong fusion structure.E.g. NTRK3 in cancer.hg38.csv:
Versus NTRK3 from the fusions.csv file generated by the script:
Here is a quick fix I introduced to
make_fusion_genes.py
:Line 33:
_, transcript, chrom, strand, start, end, _, _, _, exonstart, exonend = line.rstrip("\n").split("\t")
Line 44:
_, transcript, chrom, strand, start, end, _, _, _, exonstart, exonend = line.rstrip("\n").split("\t")
And insert these after line 50:
if strand == '-':
exons = exons[::-1]
This returns the exons in the right order.
The text was updated successfully, but these errors were encountered: