'awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted' During orthology-filter step #1

haleyhallowell · 2023-07-02T02:52:05Z

Hello! I am currently trialling your code for clustering viral species. However, I am repeatedly hitting an error during the orthology-filter step. I have isolated out the portion of the command that is throwing the error here:

cat vOTUs.fasta36 | ./joincol vOTUs.faa.lengths | ./joincol vOTUs.faa.lengths 2 | awk '{print $1 "\t" $2 "\t" $11 "\t" $13/$14 "\t" ($8-$7)/(2*$13)+($10-$9)/(2*$14) "\t" ($7+$8-$9-$10)/($13+$14)}' | awk '{if ($3 <= 0.05) print}' | awk '{if ($5 >= 0.4) print}' | awk '{if (sqrt(($4-1)^2) - (sqrt(sqrt($5))-.8) + sqrt($6^2) <= 0.1) print $1 "\t" $2}' > test.tsv

And get the following error:

awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted

I think my .faa.lengths file looks the way that it should (example snippet here):

k127_510701||0_partial_1 # 1 # 1107 # -1 # ID=1_1;partial=00;start_type=ATG;rbs_motif=GGxGG;rbs_spacer=5-
10bp;gc_cont=0.563 369
k127_510701||0_partial_2 # 1107 # 2582 # -1 # ID=1_2;partial=00;start_type=ATG;rbs_motif=GGxGG;rbs_spacer=5-10bp;gc_cont=0.539 492
k127_510701||0_partial_3 # 2735 # 3949 # -1 # ID=1_3;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.477 405
k127_510701||0_partial_4 # 4254 # 4934 # -1 # ID=1_4;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.492 227
k127_510701||0_partial_5 # 4931 # 5338 # -1 # ID=1_5;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=5-10bp;gc_cont=0.525 136
k127_510701||0_partial_6 # 5354 # 5650 # -1 # ID=1_6;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.384 99
k127_510701||0_partial_7 # 5647 # 5937 # -1 # ID=1_7;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=5-10bp;gc_cont=0.450 97

Any insight on how to fix this? Thanks!!

The text was updated successfully, but these errors were encountered:

shiraz-shah · 2023-07-02T04:00:34Z

Hey Haley, Could you try to trim down the headers in the protein fasta file so they don’t have all that extra prodigal info? Then recompute the lengths file. Then proceed to the orthology step. So e.g. the first protein’d ID should be only k127_510701||0_partial_1 and so on. Fasta good already trained that part of you look at the tabular output, but since it’s there in your lengths file, joincol can’t match up the columns. Thanks for pointing this out. I’ll try to fix it, but in the meantime the above hack should do the trick.

On Sun, 2 Jul 2023 at 04.52, haleyhallowell ***@***.***> wrote: Hello! I am currently trialling your code for clustering viral species. However, I am repeatedly hitting an error during the orthology-filter step. I have isolated out the portion of the command that is throwing the error here: cat vOTUs.fasta36 | ./joincol vOTUs.faa.lengths | ./joincol vOTUs.faa.lengths 2 | awk '{print $1 "\t" $2 "\t" $11 "\t" $13/$14 "\t" ($8-$7)/(2*$13)+($10-$9)/(2*$14) "\t" ($7+$8-$9-$10)/($13+$14)}' | awk '{if ($3 <= 0.05) print}' | awk '{if ($5 >= 0.4) print}' | awk '{if (sqrt(($4-1)^2) - (sqrt(sqrt($5))-.8) + sqrt($6^2) <= 0.1) print $1 "\t" $2}' > test.tsv And get the following error: awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted I think my .faa.lengths file looks the way that it should (example snippet here): k127_510701||0_partial_1 # 1 # 1107 # -1 # ID=1_1;partial=00;start_type=ATG;rbs_motif=GGxGG;rbs_spacer=5- 10bp;gc_cont=0.563 369 k127_510701||0_partial_2 # 1107 # 2582 # -1 # ID=1_2;partial=00;start_type=ATG;rbs_motif=GGxGG;rbs_spacer=5-10bp;gc_cont=0.539 492 k127_510701||0_partial_3 # 2735 # 3949 # -1 # ID=1_3;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.477 405 k127_510701||0_partial_4 # 4254 # 4934 # -1 # ID=1_4;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.492 227 k127_510701||0_partial_5 # 4931 # 5338 # -1 # ID=1_5;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=5-10bp;gc_cont=0.525 136 k127_510701||0_partial_6 # 5354 # 5650 # -1 # ID=1_6;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.384 99 k127_510701||0_partial_7 # 5647 # 5937 # -1 # ID=1_7;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=5-10bp;gc_cont=0.450 97 Any insight on how to fix this? Thanks!! — Reply to this email directly, view it on GitHub <#1>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOJORBBSE5R2VRTRP3FRGGLXODO6BANCNFSM6AAAAAAZ3FDRLQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- Shiraz A. Shah, MSc, PhD Senior Researcher C‌openhagen Prospective Studies on Asthma in Childhood H‌erlev and G‌entofte H‌ospital, U‌niversity of C‌openhagen www.copsac.com

haleyhallowell · 2023-07-02T16:45:59Z

HI! Thanks for your quick reply!

I edited the original .faa file to look like this before fasta36


>k127_510701||0_partial_1
FLGKKMPEGTAVEWCRTLLGPLFGNLVLLGLGAKLFCDGALELRIFSEIIRRTMLPATPV
WVLTLVVLLIAGALAAQGMECRGRTAEILFFLVSVPLLVVLLAVAFSAEYGRVLPLEPPA
AEGWKDAFASLGVLFQGLAFLYFIIPDLKKPQKMQRAVLAAGVHAALLFATLTFLSLAVY
GEVMLKQKLLPTLQMLERVSFTGIFLTRQDVLLLWFWIASVSVFLSGTLFFSSLMGVRMA
GQTEAKRKNWLYAALAVLFLLSFLPEDMAQAHEFRLRASLWLNGFYLLVLPALLLFLAAV
RKKGGKAV*
>k127_510701||0_partial_2
MARCEKQGTNGILQSLMEDVIAIGEMGKITTLQEVYDAVLLGDTVLLMQGDDFALQASTK
GFPSRGVNKAETEVVVQGPKDAFTELMSINVVLTRRRIRDTKLKLKRKKVGTRSKTDVAL

And reran fasta36, along with the other scripts and still threw the same error:
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted

shiraz-shah · 2023-07-02T17:41:58Z

Hey Haley,
Can I see what your lengths file looks like?

haleyhallowell · 2023-07-02T17:55:25Z

Sure! it looks like this:

k127_510701||0_partial_1        309
k127_510701||0_partial_2        432
k127_510701||0_partial_3        345
k127_510701||0_partial_4        167
k127_510701||0_partial_5        136
k127_510701||0_partial_6        99
k127_510701||0_partial_7        37
k127_510701||0_partial_8        94
k127_510701||0_partial_9        3390
k127_510701||0_partial_10       413
k127_510701||0_partial_11       463
k127_510701||0_partial_12       497
k127_510701||0_partial_13       127
k127_510701||0_partial_14       75
k127_510701||0_partial_15       38
k127_510701||0_partial_16       463
k127_510701||0_partial_17       209
k127_281361||0_partial_1        39
k127_281361||0_partial_2        304
k127_281361||0_partial_3        50
k127_281361||0_partial_4        343
k127_281361||0_partial_5        173
k127_715127||0_partial_1        95

shiraz-shah · 2023-07-03T09:08:21Z

It looks like it should. Could you maybe send me the fasta file so I can try to reproduce the error? How big is it?

haleyhallowell · 2023-07-03T11:36:50Z

Sure! Where can I send it? It’s about ~40MB

shiraz-shah · 2023-07-03T12:01:02Z

Alright, so too large for email. Do you have any suggestions? Have you ever used Google Drive or WeTransfer or DropBox? If you already have a google account, Google Drive is probably the easiest. My google account is [email protected], if you want to share the file with me privately via Google Drive.

haleyhallowell · 2023-07-03T12:15:52Z

ah, its actually about 14. ill go ahead and email it to the email above! Thanks!

shiraz-shah · 2023-07-04T08:26:16Z

Hey Haley,
It's because the fasta file has a sequence of length 0 called "k127_1459403||full_27". If you remove that sequence, and redo the fasta step, then I think it should work.

The reason why fasta36 is even able to produce an alignment result for that sequence, is because it automatically takes the next sequence "k127_1459403||full_28" and treats it as if it was "k127_1459403||full_27".

haleyhallowell · 2023-07-04T17:21:10Z

Hello! I found a couple fasta entries without sequences. I went through and removed those sequences and checked the lengths file to make sure there were no more 0s, then re-ran fasta36 as well as the steps below, and it still gave me the same error. I then went back and found a few fasta sequences with a '*' as its sequence entry and removed those as well to see if that was the issue. I just finished that run and got the same error as well. I double checked that the files were correct, etc

shiraz-shah · 2023-07-05T05:40:24Z

Could you send me the corrected fasta file? I’d like to try again.

haleyhallowell · 2023-07-05T12:40:57Z

Sure! Sending to you now.

shiraz-shah · 2023-07-06T12:01:11Z

Hey Haley,
I ran everything with the new file, and it seems to work fine, actually. Here's what I did:
fasta36 test_2removed.faa test_2removed.faa -m 8 > test_2removed.fastab
cat test_2removed.faa | ./f2s | ./seqlengths > test_2removed.lengths
cat test_2removed.fastab | ./joincol test_2removed.lengths | ./joincol test_2removed.lengths 2 | awk '{print $1 "\t" $2 "\t" $11 "\t" $13/$14 "\t" ($8-$7)/(2*$13)+($10-$9)/(2*$14) "\t" ($7+$8-$9-$10)/($13+$14)}' | awk '{if ($3 <= 0.05) print}' | awk '{if ($5 >= 0.4) print}' | awk '{if (sqrt(($4-1)^2) - (sqrt(sqrt($5))-.8) + sqrt($6^2) <= 0.1) print $1 "\t" $2}' > test_2removed.tsv

$ head test_2removed.tsv
k127_510701||0_partial_1	k127_510701||0_partial_1
k127_510701||0_partial_1	k127_655921||full_23
k127_510701||0_partial_1	k127_423438||full_103
k127_510701||0_partial_2	k127_510701||0_partial_2
k127_510701||0_partial_2	k127_156558||full_89
k127_510701||0_partial_2	k127_423438||full_104
k127_510701||0_partial_2	k127_655921||full_22
k127_510701||0_partial_2	k127_214787||0_partial_88
k127_510701||0_partial_2	k127_362384||full_76
k127_510701||0_partial_2	k127_259829||0_partial_130

haleyhallowell · 2023-07-06T16:12:14Z

got it working! Don't know why it wasn't running from my end. Thanks so much for your time and help!

shiraz-shah · 2023-07-10T08:26:42Z

No problem. Hope it's useful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted' During orthology-filter step #1

'awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted' During orthology-filter step #1

haleyhallowell commented Jul 2, 2023

shiraz-shah commented Jul 2, 2023 via email

haleyhallowell commented Jul 2, 2023 •

edited

Loading

shiraz-shah commented Jul 2, 2023

haleyhallowell commented Jul 2, 2023

shiraz-shah commented Jul 3, 2023

haleyhallowell commented Jul 3, 2023

shiraz-shah commented Jul 3, 2023

haleyhallowell commented Jul 3, 2023

shiraz-shah commented Jul 4, 2023

haleyhallowell commented Jul 4, 2023

shiraz-shah commented Jul 5, 2023

haleyhallowell commented Jul 5, 2023

shiraz-shah commented Jul 6, 2023

haleyhallowell commented Jul 6, 2023

shiraz-shah commented Jul 10, 2023

'awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted' During orthology-filter step #1

'awk: cmd. line:1: (FILENAME=- FNR=1) fatal: division by zero attempted' During orthology-filter step #1

Comments

haleyhallowell commented Jul 2, 2023

shiraz-shah commented Jul 2, 2023 via email

haleyhallowell commented Jul 2, 2023 • edited Loading

shiraz-shah commented Jul 2, 2023

haleyhallowell commented Jul 2, 2023

shiraz-shah commented Jul 3, 2023

haleyhallowell commented Jul 3, 2023

shiraz-shah commented Jul 3, 2023

haleyhallowell commented Jul 3, 2023

shiraz-shah commented Jul 4, 2023

haleyhallowell commented Jul 4, 2023

shiraz-shah commented Jul 5, 2023

haleyhallowell commented Jul 5, 2023

shiraz-shah commented Jul 6, 2023

haleyhallowell commented Jul 6, 2023

shiraz-shah commented Jul 10, 2023

haleyhallowell commented Jul 2, 2023 •

edited

Loading