completed permanentFail #168

CeciliaDeng · 2024-10-29T20:17:39Z

Description of the bug

The assemblyQC pipeline failed on a set of transcriptome assemblies. The .nextflow.log ended with DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye, while the slurm log showed:

[job seqtransform_step] completed permanentFail
[step seqtransform_step] completed permanentFail
[workflow GenerateCleanedFasta] completed permanentFail
[step GenerateCleanedFasta] completed permanentFail
[workflow ] completed permanentFail

This issue is potentially related to SeqID issue in NCBI FCS. '>SeqID with_notes' is common in fasta files.

Thank you.

Command used and terminal output

cd $path/to/assemblyqc
sbatch pfr_assemblyqc_Resume

Relevant files

No response

System information

No response

CeciliaDeng · 2024-10-30T01:32:04Z

Setting "ncbi_fcs_adaptor_skip" and "ncbi_fcs_gx_skip" to true, the pipeline worked okay.

GallVp · 2024-10-30T01:59:22Z

Setting "ncbi_fcs_adaptor_skip" and "ncbi_fcs_gx_skip" to true, the pipeline worked okay.

Yes, because you skipped the tool which was failing?

I'll try to reproduce this issue and possibly fix it by sanitising the fasta header.

SarahBailey1998 · 2024-11-01T03:46:45Z

Hi @CeciliaDeng @GallVp

I got the same error and when I checked the .command.log in the working directory I saw:

>h1tg000112l_1
        WARNING: Too many Ns in sequence: 17557 out of 17557 = 100.0%

Then when I check that sequence in the genome assembly I was checking it is actually 100% N's:

>h1tg000112l_1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNN.....

I used hifiasm and purge dups to produce this genome assembly so I am guessing one of those tools isn't working properly.

GallVp · 2024-11-03T19:47:13Z

Hi @SarahBailey1998

Thank you for the post. That's very useful.

I'll soon start work on this issue and will include a test case to check if I have fixed it.

In the case of 100% NNNN..., a simple solution is that the pipeline removes them before passing the fasta to fcs adaptor check because these sequences clearly don't have any contamination.

SarahBailey1998 · 2024-11-03T20:10:38Z

Thanks @GallVp

I actually found this error useful because I wasn't aware that those contigs were just N's. Maybe if the pipeline doesn't include them in the adaptor check a note is made that they were there?

GallVp · 2024-11-03T20:13:27Z

@SarahBailey1998
Do you think it should be a validation failure then? Because it does not make sense to have 100% NNNs in a sequence.

SarahBailey1998 · 2024-11-03T20:15:09Z

Yeah I think so, it definitely makes no sense

GallVp · 2024-11-03T20:22:23Z

Yeah I think so, it definitely makes no sense

Thanks. I will track this objective under #173

rosscrowhurst · 2024-11-03T21:15:47Z

Contigs entirely composed of Ns should not be created in the first place by the assembler - why it did that should be investigated. Introducing a check of N count vs contig length is easy to do but also removal of contigs with greater than x% Ns or greater than x% of unpolished based (likely for contigs that receive some orom of polishing during assembly) would also remove them.

SarahBailey1998 · 2024-11-03T21:46:40Z

Thanks @rosscrowhurst

I was surprised to discover these problem contigs and am investigating the cause

CeciliaDeng · 2024-11-04T23:51:32Z

Hi @SarahBailey1998, in my case the inputs were transcript sequences and not many Ns present there. All the seqID lines have additional information (eg. ">g43.t1 type=CDS; aalen=194,100%,complete"). I suspect that caused the failure of NCBI FCS tools.

CeciliaDeng added the bug Something isn't working label Oct 29, 2024

CeciliaDeng assigned GallVp Oct 29, 2024

GallVp added this to the 2.2.0 milestone Oct 30, 2024

GallVp modified the milestones: 2.2.0, 2.3.0 Nov 4, 2024

GallVp removed their assignment Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

completed permanentFail #168

completed permanentFail #168

CeciliaDeng commented Oct 29, 2024

CeciliaDeng commented Oct 30, 2024

GallVp commented Oct 30, 2024

SarahBailey1998 commented Nov 1, 2024

GallVp commented Nov 3, 2024

SarahBailey1998 commented Nov 3, 2024

GallVp commented Nov 3, 2024

SarahBailey1998 commented Nov 3, 2024

GallVp commented Nov 3, 2024

rosscrowhurst commented Nov 3, 2024 •

edited

Loading

SarahBailey1998 commented Nov 3, 2024

CeciliaDeng commented Nov 4, 2024

completed permanentFail #168

completed permanentFail #168

Comments

CeciliaDeng commented Oct 29, 2024

Description of the bug

Command used and terminal output

Relevant files

System information

CeciliaDeng commented Oct 30, 2024

GallVp commented Oct 30, 2024

SarahBailey1998 commented Nov 1, 2024

GallVp commented Nov 3, 2024

SarahBailey1998 commented Nov 3, 2024

GallVp commented Nov 3, 2024

SarahBailey1998 commented Nov 3, 2024

GallVp commented Nov 3, 2024

rosscrowhurst commented Nov 3, 2024 • edited Loading

SarahBailey1998 commented Nov 3, 2024

CeciliaDeng commented Nov 4, 2024

rosscrowhurst commented Nov 3, 2024 •

edited

Loading