Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run time for metagenome #11

Open
microEcology opened this issue Oct 3, 2022 · 5 comments
Open

Run time for metagenome #11

microEcology opened this issue Oct 3, 2022 · 5 comments

Comments

@microEcology
Copy link

We're testing this implementation of the LazyB, MuCHSALSA, on a metagenome with both Illumina and Pacbio reads. We're testing these hybrid assemblies for soil metagenomes.
The first sample is still assembling, with 3.9Gb short reads and 5.3Gb long reads, and it's been running for more than a month, is this expected? The same sample was successfully assembled with Unicycler.

This is the command called:
sh pipeline.sh 50 90 Sample7 ../reads_/Sample7_R1.fq ../reads_/Sample7_R2.fq ../reads_/Sample7_99.fasta Sample7_muchsalsa

Last Abyss log is this (last update to this file was on 22-08):

Loaded 1853341012 k-mer. At least 104 GB of RAM is required.
Minimum k-mer coverage is 56
Using a coverage threshold of 1...
The median k-mer coverage is 1
The reconstruction is 1853341012
The k-mer coverage threshold is 1
Setting parameter e (erode) to 2
Setting parameter E (erodeStrand) to 0
Setting parameter c (coverage) to 2
Finding adjacent k-mer...

Is this extremely long running time expected? Would this assembler not work for metagenomes?

@TGatter
Copy link
Collaborator

TGatter commented Oct 3, 2022

Thank you for your interest in our software!

Our pipeline was not development with metagenomes in mind.
The initial kmer trimming steps (based on Jellyfish) are designed to trim the kmer peaks of single genomes.
I would therefore presume that such trimming can lead to undesired results in meta-genomic data.

This behaviour is nevertheless not expected. The error you encountered appears to be within ABySS if I understood your question correctly. Have you attempted ABySS assemblies without pre-filter? Did you find any other errors? Is AbySS still running or on standby?

@microEcology
Copy link
Author

Thank you for replying!

There isn't any error message, and is still running, it's just taking too long...

We didn't try ABySS alone yet. Do you think is still worth waiting or would be better to move on to ABySS without pre-filter?
Or maybe using some other values for kmer trimming would solve this?

@0x002A
Copy link
Owner

0x002A commented Jan 14, 2023

Any news on this issue? @TGatter, @microEcology?

@microEcology
Copy link
Author

I tried to test ABySS standalone, and with no jellyfish or kmer filtering, and I ran into the same problem. The assembly was running for a couple of weeks with no progress and no errors either, and stuck on the same step, so I gave up using it. I'm not sure yet what was the issue though.

@TGatter
Copy link
Collaborator

TGatter commented Jan 16, 2023

There are several components on the tool that might cause problems if we have large near complete subgraphs in the assembly graphs, which might be a possible complication for meta genomes. As it stands, it is designed for sparse assembly data. We plan to start a project on an artificial metagenome later this year. I will report here if we find something that might be of help to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants