-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
completed permanentFail #168
Comments
Setting "ncbi_fcs_adaptor_skip" and "ncbi_fcs_gx_skip" to true, the pipeline worked okay. |
Yes, because you skipped the tool which was failing? I'll try to reproduce this issue and possibly fix it by sanitising the |
I got the same error and when I checked the >h1tg000112l_1
WARNING: Too many Ns in sequence: 17557 out of 17557 = 100.0% Then when I check that sequence in the genome assembly I was checking it is actually 100% N's: >h1tg000112l_1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN..... I used hifiasm and purge dups to produce this genome assembly so I am guessing one of those tools isn't working properly. |
Thank you for the post. That's very useful. I'll soon start work on this issue and will include a test case to check if I have fixed it. In the case of 100% |
Thanks @GallVp I actually found this error useful because I wasn't aware that those contigs were just N's. Maybe if the pipeline doesn't include them in the adaptor check a note is made that they were there? |
@SarahBailey1998 |
Yeah I think so, it definitely makes no sense |
Thanks. I will track this objective under #173 |
Contigs entirely composed of Ns should not be created in the first place by the assembler - why it did that should be investigated. Introducing a check of N count vs contig length is easy to do but also removal of contigs with greater than x% Ns or greater than x% of unpolished based (likely for contigs that receive some orom of polishing during assembly) would also remove them. |
Thanks @rosscrowhurst I was surprised to discover these problem contigs and am investigating the cause |
Hi @SarahBailey1998, in my case the inputs were transcript sequences and not many Ns present there. All the seqID lines have additional information (eg. ">g43.t1 type=CDS; aalen=194,100%,complete"). I suspect that caused the failure of NCBI FCS tools. |
Description of the bug
Hi @GallVp ,
The assemblyQC pipeline failed on a set of transcriptome assemblies. The .nextflow.log ended with DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye, while the slurm log showed:
This issue is potentially related to SeqID issue in NCBI FCS. '>SeqID with_notes' is common in fasta files.
Thank you.
Command used and terminal output
Relevant files
No response
System information
No response
The text was updated successfully, but these errors were encountered: