-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted' During orthology-filter step #1
Comments
Hey Haley,
Could you try to trim down the headers in the protein fasta file so they
don’t have all that extra prodigal info? Then recompute the lengths file.
Then proceed to the orthology step.
So e.g. the first protein’d ID should be only k127_510701||0_partial_1 and
so on. Fasta good already trained that part of you look at the tabular
output, but since it’s there in your lengths file, joincol can’t match up
the columns.
Thanks for pointing this out. I’ll try to fix it, but in the meantime the
above hack should do the trick.
On Sun, 2 Jul 2023 at 04.52, haleyhallowell ***@***.***> wrote:
Hello! I am currently trialling your code for clustering viral species.
However, I am repeatedly hitting an error during the orthology-filter step.
I have isolated out the portion of the command that is throwing the error
here:
cat vOTUs.fasta36 | ./joincol vOTUs.faa.lengths | ./joincol
vOTUs.faa.lengths 2 | awk '{print $1 "\t" $2 "\t" $11 "\t" $13/$14 "\t"
($8-$7)/(2*$13)+($10-$9)/(2*$14) "\t" ($7+$8-$9-$10)/($13+$14)}' | awk '{if
($3 <= 0.05) print}' | awk '{if ($5 >= 0.4) print}' | awk '{if
(sqrt(($4-1)^2) - (sqrt(sqrt($5))-.8) + sqrt($6^2) <= 0.1) print $1 "\t"
$2}' > test.tsv
And get the following error:
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted
I think my .faa.lengths file looks the way that it should (example snippet
here):
k127_510701||0_partial_1 # 1 # 1107 # -1 #
ID=1_1;partial=00;start_type=ATG;rbs_motif=GGxGG;rbs_spacer=5-
10bp;gc_cont=0.563 369
k127_510701||0_partial_2 # 1107 # 2582 # -1 #
ID=1_2;partial=00;start_type=ATG;rbs_motif=GGxGG;rbs_spacer=5-10bp;gc_cont=0.539
492
k127_510701||0_partial_3 # 2735 # 3949 # -1 #
ID=1_3;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.477
405
k127_510701||0_partial_4 # 4254 # 4934 # -1 #
ID=1_4;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.492
227
k127_510701||0_partial_5 # 4931 # 5338 # -1 #
ID=1_5;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=5-10bp;gc_cont=0.525
136
k127_510701||0_partial_6 # 5354 # 5650 # -1 #
ID=1_6;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.384
99
k127_510701||0_partial_7 # 5647 # 5937 # -1 #
ID=1_7;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=5-10bp;gc_cont=0.450
97
Any insight on how to fix this? Thanks!!
—
Reply to this email directly, view it on GitHub
<#1>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOJORBBSE5R2VRTRP3FRGGLXODO6BANCNFSM6AAAAAAZ3FDRLQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Shiraz A. Shah, MSc, PhD
Senior Researcher
Copenhagen Prospective Studies on Asthma in Childhood
Herlev and Gentofte Hospital, University of Copenhagen
www.copsac.com
|
HI! Thanks for your quick reply! I edited the original .faa file to look like this before fasta36
And reran fasta36, along with the other scripts and still threw the same error: |
Hey Haley, |
Sure! it looks like this:
|
It looks like it should. Could you maybe send me the fasta file so I can try to reproduce the error? How big is it? |
Sure! Where can I send it? It’s about ~40MB |
Alright, so too large for email. Do you have any suggestions? Have you ever used Google Drive or WeTransfer or DropBox? If you already have a google account, Google Drive is probably the easiest. My google account is [email protected], if you want to share the file with me privately via Google Drive. |
ah, its actually about 14. ill go ahead and email it to the email above! Thanks! |
Hey Haley, The reason why fasta36 is even able to produce an alignment result for that sequence, is because it automatically takes the next sequence "k127_1459403||full_28" and treats it as if it was "k127_1459403||full_27". |
Hello! I found a couple fasta entries without sequences. I went through and removed those sequences and checked the lengths file to make sure there were no more 0s, then re-ran fasta36 as well as the steps below, and it still gave me the same error. I then went back and found a few fasta sequences with a '*' as its sequence entry and removed those as well to see if that was the issue. I just finished that run and got the same error as well. I double checked that the files were correct, etc |
Could you send me the corrected fasta file? I’d like to try again. |
Sure! Sending to you now. |
Hey Haley,
|
got it working! Don't know why it wasn't running from my end. Thanks so much for your time and help! |
No problem. Hope it's useful! |
Hello! I am currently trialling your code for clustering viral species. However, I am repeatedly hitting an error during the orthology-filter step. I have isolated out the portion of the command that is throwing the error here:
cat vOTUs.fasta36 | ./joincol vOTUs.faa.lengths | ./joincol vOTUs.faa.lengths 2 | awk '{print $1 "\t" $2 "\t" $11 "\t" $13/$14 "\t" ($8-$7)/(2*$13)+($10-$9)/(2*$14) "\t" ($7+$8-$9-$10)/($13+$14)}' | awk '{if ($3 <= 0.05) print}' | awk '{if ($5 >= 0.4) print}' | awk '{if (sqrt(($4-1)^2) - (sqrt(sqrt($5))-.8) + sqrt($6^2) <= 0.1) print $1 "\t" $2}' > test.tsv
And get the following error:
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted
I think my .faa.lengths file looks the way that it should (example snippet here):
Any insight on how to fix this? Thanks!!
The text was updated successfully, but these errors were encountered: