Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best paramaters for a very fast, rough assembly #11

Open
tseemann opened this issue Oct 10, 2018 · 7 comments
Open

Best paramaters for a very fast, rough assembly #11

tseemann opened this issue Oct 10, 2018 · 7 comments

Comments

@tseemann
Copy link

For some use cases, we want a very rough assembly but we need it very quickly.

What options would you suggest to reduce the runtime?

I'm guessing using less k-mers would speed things up.
Would be grateful for any ideas you have.

@souvorov
Copy link
Collaborator

Yes, reducing the number of k-mers is the way to go. There are two k-mer loops in SKESA. The first one deals with the k-mers which are shorter than the read length. The number of the this iterations is controlled by --steps. The second loop uses long k-mers up to the insert length. It is always 3 iterations for paired_end reads or nothing for single-end reads. My suggestion for a bare-bones assembly of the example from releases:

skesa --fasta SRR5449060_1.fasta --fasta SRR5449060_2.fasta --steps 1 --kmer 51 --vector_percent 1 --cores 10 > rslt.fa

On my computer this reduced the wall-clock time from 100s to 16s. The assembly will not be worse in terms of bad bases but will be more fragmented. The choice of --kmer is critical for fragmentation. My bet is 50%-80% of the read length but you'll have to experiment.

You may try to use --hash_count. It will definitely save some memory, and for high coverage sample might save some time.

Please, let me know the results of your experiments.

@tseemann
Copy link
Author

tseemann commented Oct 13, 2018

Thanks so much for the response. I will do some experimenting!

@tseemann
Copy link
Author

tseemann commented Oct 13, 2018

My first test was very promising, a Listeria isolate with no problem with genotyping:

Name          no   bp       min  max
skesa-31.fa   91   2885169  200  355669
skesa-41.fa   79   2888278  203  387381
skesa-51.fa   60   2895598  208  571292
skesa-61.fa   75   2894908  203  299882
skesa-71.fa   85   2892681  240  237220
skesa-81.fa   109  2890956  289  141351
skesa-91.fa   168  2878715  279  119073
skesa-31.fa     lmonocytogenes  1       abcZ(3) bglA(1) cat(1)  dapE(1) dat(3)  ldh(1)  lhkA(3)
skesa-41.fa     lmonocytogenes  1       abcZ(3) bglA(1) cat(1)  dapE(1) dat(3)  ldh(1)  lhkA(3)
skesa-51.fa     lmonocytogenes  1       abcZ(3) bglA(1) cat(1)  dapE(1) dat(3)  ldh(1)  lhkA(3)
skesa-61.fa     lmonocytogenes  1       abcZ(3) bglA(1) cat(1)  dapE(1) dat(3)  ldh(1)  lhkA(3)
skesa-71.fa     lmonocytogenes  1       abcZ(3) bglA(1) cat(1)  dapE(1) dat(3)  ldh(1)  lhkA(3)
skesa-81.fa     lmonocytogenes  1       abcZ(3) bglA(1) cat(1)  dapE(1) dat(3)  ldh(1)  lhkA(3)
skesa-91.fa     lmonocytogenes  1       abcZ(3) bglA(1) cat(1)  dapE(1) dat(3)  ldh(1)  lhkA(3)

@lskatz
Copy link

lskatz commented Jan 3, 2019

Does changing vector_percent help speed up the assembly?

@souvorov
Copy link
Collaborator

souvorov commented Jan 3, 2019

By default SKESA clips adapters/vectors from reads before assembling. --vector_percent 1 disables this step which makes the whole process somewhat faster. It is recommended only if you know that the reads don't have adapters.

@lskatz
Copy link

lskatz commented Jan 3, 2019

@tseemann did you notice any pros/cons of letting SKESA or some other tool trim adapters for you?

@tseemann
Copy link
Author

@lskatz our Illumina software is setup to remove all adapters, by putting the nextera transposase in the SampleSheet.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants