Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted' During orthology-filter step #1

Open
haleyhallowell opened this issue Jul 2, 2023 · 15 comments

Comments

@haleyhallowell
Copy link

Hello! I am currently trialling your code for clustering viral species. However, I am repeatedly hitting an error during the orthology-filter step. I have isolated out the portion of the command that is throwing the error here:

cat vOTUs.fasta36 | ./joincol vOTUs.faa.lengths | ./joincol vOTUs.faa.lengths 2 | awk '{print $1 "\t" $2 "\t" $11 "\t" $13/$14 "\t" ($8-$7)/(2*$13)+($10-$9)/(2*$14) "\t" ($7+$8-$9-$10)/($13+$14)}' | awk '{if ($3 <= 0.05) print}' | awk '{if ($5 >= 0.4) print}' | awk '{if (sqrt(($4-1)^2) - (sqrt(sqrt($5))-.8) + sqrt($6^2) <= 0.1) print $1 "\t" $2}' > test.tsv

And get the following error:

awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted

I think my .faa.lengths file looks the way that it should (example snippet here):

k127_510701||0_partial_1 # 1 # 1107 # -1 # ID=1_1;partial=00;start_type=ATG;rbs_motif=GGxGG;rbs_spacer=5-
10bp;gc_cont=0.563 369
k127_510701||0_partial_2 # 1107 # 2582 # -1 # ID=1_2;partial=00;start_type=ATG;rbs_motif=GGxGG;rbs_spacer=5-10bp;gc_cont=0.539 492
k127_510701||0_partial_3 # 2735 # 3949 # -1 # ID=1_3;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.477 405
k127_510701||0_partial_4 # 4254 # 4934 # -1 # ID=1_4;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.492 227
k127_510701||0_partial_5 # 4931 # 5338 # -1 # ID=1_5;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=5-10bp;gc_cont=0.525 136
k127_510701||0_partial_6 # 5354 # 5650 # -1 # ID=1_6;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.384 99
k127_510701||0_partial_7 # 5647 # 5937 # -1 # ID=1_7;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=5-10bp;gc_cont=0.450 97

Any insight on how to fix this? Thanks!!

@shiraz-shah
Copy link
Owner

shiraz-shah commented Jul 2, 2023 via email

@haleyhallowell
Copy link
Author

haleyhallowell commented Jul 2, 2023

HI! Thanks for your quick reply!

I edited the original .faa file to look like this before fasta36


>k127_510701||0_partial_1
FLGKKMPEGTAVEWCRTLLGPLFGNLVLLGLGAKLFCDGALELRIFSEIIRRTMLPATPV
WVLTLVVLLIAGALAAQGMECRGRTAEILFFLVSVPLLVVLLAVAFSAEYGRVLPLEPPA
AEGWKDAFASLGVLFQGLAFLYFIIPDLKKPQKMQRAVLAAGVHAALLFATLTFLSLAVY
GEVMLKQKLLPTLQMLERVSFTGIFLTRQDVLLLWFWIASVSVFLSGTLFFSSLMGVRMA
GQTEAKRKNWLYAALAVLFLLSFLPEDMAQAHEFRLRASLWLNGFYLLVLPALLLFLAAV
RKKGGKAV*
>k127_510701||0_partial_2
MARCEKQGTNGILQSLMEDVIAIGEMGKITTLQEVYDAVLLGDTVLLMQGDDFALQASTK
GFPSRGVNKAETEVVVQGPKDAFTELMSINVVLTRRRIRDTKLKLKRKKVGTRSKTDVAL


And reran fasta36, along with the other scripts and still threw the same error:
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted

@shiraz-shah
Copy link
Owner

Hey Haley,
Can I see what your lengths file looks like?

@haleyhallowell
Copy link
Author

Sure! it looks like this:

k127_510701||0_partial_1        309
k127_510701||0_partial_2        432
k127_510701||0_partial_3        345
k127_510701||0_partial_4        167
k127_510701||0_partial_5        136
k127_510701||0_partial_6        99
k127_510701||0_partial_7        37
k127_510701||0_partial_8        94
k127_510701||0_partial_9        3390
k127_510701||0_partial_10       413
k127_510701||0_partial_11       463
k127_510701||0_partial_12       497
k127_510701||0_partial_13       127
k127_510701||0_partial_14       75
k127_510701||0_partial_15       38
k127_510701||0_partial_16       463
k127_510701||0_partial_17       209
k127_281361||0_partial_1        39
k127_281361||0_partial_2        304
k127_281361||0_partial_3        50
k127_281361||0_partial_4        343
k127_281361||0_partial_5        173
k127_715127||0_partial_1        95

@shiraz-shah
Copy link
Owner

It looks like it should. Could you maybe send me the fasta file so I can try to reproduce the error? How big is it?

@haleyhallowell
Copy link
Author

Sure! Where can I send it? It’s about ~40MB

@shiraz-shah
Copy link
Owner

Alright, so too large for email. Do you have any suggestions? Have you ever used Google Drive or WeTransfer or DropBox? If you already have a google account, Google Drive is probably the easiest. My google account is [email protected], if you want to share the file with me privately via Google Drive.

@haleyhallowell
Copy link
Author

ah, its actually about 14. ill go ahead and email it to the email above! Thanks!

@shiraz-shah
Copy link
Owner

Hey Haley,
It's because the fasta file has a sequence of length 0 called "k127_1459403||full_27". If you remove that sequence, and redo the fasta step, then I think it should work.

The reason why fasta36 is even able to produce an alignment result for that sequence, is because it automatically takes the next sequence "k127_1459403||full_28" and treats it as if it was "k127_1459403||full_27".

@haleyhallowell
Copy link
Author

Hello! I found a couple fasta entries without sequences. I went through and removed those sequences and checked the lengths file to make sure there were no more 0s, then re-ran fasta36 as well as the steps below, and it still gave me the same error. I then went back and found a few fasta sequences with a '*' as its sequence entry and removed those as well to see if that was the issue. I just finished that run and got the same error as well. I double checked that the files were correct, etc

@shiraz-shah
Copy link
Owner

Could you send me the corrected fasta file? I’d like to try again.

@haleyhallowell
Copy link
Author

Sure! Sending to you now.

@shiraz-shah
Copy link
Owner

Hey Haley,
I ran everything with the new file, and it seems to work fine, actually. Here's what I did:
fasta36 test_2removed.faa test_2removed.faa -m 8 > test_2removed.fastab
cat test_2removed.faa | ./f2s | ./seqlengths > test_2removed.lengths
cat test_2removed.fastab | ./joincol test_2removed.lengths | ./joincol test_2removed.lengths 2 | awk '{print $1 "\t" $2 "\t" $11 "\t" $13/$14 "\t" ($8-$7)/(2*$13)+($10-$9)/(2*$14) "\t" ($7+$8-$9-$10)/($13+$14)}' | awk '{if ($3 <= 0.05) print}' | awk '{if ($5 >= 0.4) print}' | awk '{if (sqrt(($4-1)^2) - (sqrt(sqrt($5))-.8) + sqrt($6^2) <= 0.1) print $1 "\t" $2}' > test_2removed.tsv

$ head test_2removed.tsv
k127_510701||0_partial_1	k127_510701||0_partial_1
k127_510701||0_partial_1	k127_655921||full_23
k127_510701||0_partial_1	k127_423438||full_103
k127_510701||0_partial_2	k127_510701||0_partial_2
k127_510701||0_partial_2	k127_156558||full_89
k127_510701||0_partial_2	k127_423438||full_104
k127_510701||0_partial_2	k127_655921||full_22
k127_510701||0_partial_2	k127_214787||0_partial_88
k127_510701||0_partial_2	k127_362384||full_76
k127_510701||0_partial_2	k127_259829||0_partial_130

@haleyhallowell
Copy link
Author

got it working! Don't know why it wasn't running from my end. Thanks so much for your time and help!

@shiraz-shah
Copy link
Owner

No problem. Hope it's useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants