Best paramaters for a very fast, rough assembly #11

tseemann · 2018-10-10T23:28:32Z

For some use cases, we want a very rough assembly but we need it very quickly.

What options would you suggest to reduce the runtime?

I'm guessing using less k-mers would speed things up.
Would be grateful for any ideas you have.

souvorov · 2018-10-11T15:48:13Z

Yes, reducing the number of k-mers is the way to go. There are two k-mer loops in SKESA. The first one deals with the k-mers which are shorter than the read length. The number of the this iterations is controlled by --steps. The second loop uses long k-mers up to the insert length. It is always 3 iterations for paired_end reads or nothing for single-end reads. My suggestion for a bare-bones assembly of the example from releases:

skesa --fasta SRR5449060_1.fasta --fasta SRR5449060_2.fasta --steps 1 --kmer 51 --vector_percent 1 --cores 10 > rslt.fa

On my computer this reduced the wall-clock time from 100s to 16s. The assembly will not be worse in terms of bad bases but will be more fragmented. The choice of --kmer is critical for fragmentation. My bet is 50%-80% of the read length but you'll have to experiment.

You may try to use --hash_count. It will definitely save some memory, and for high coverage sample might save some time.

Please, let me know the results of your experiments.

tseemann · 2018-10-13T00:26:42Z

Thanks so much for the response. I will do some experimenting!

tseemann · 2018-10-13T00:47:01Z

My first test was very promising, a Listeria isolate with no problem with genotyping:

Name          no   bp       min  max
skesa-31.fa   91   2885169  200  355669
skesa-41.fa   79   2888278  203  387381
skesa-51.fa   60   2895598  208  571292
skesa-61.fa   75   2894908  203  299882
skesa-71.fa   85   2892681  240  237220
skesa-81.fa   109  2890956  289  141351
skesa-91.fa   168  2878715  279  119073

skesa-31.fa     lmonocytogenes  1       abcZ(3) bglA(1) cat(1)  dapE(1) dat(3)  ldh(1)  lhkA(3)
skesa-41.fa     lmonocytogenes  1       abcZ(3) bglA(1) cat(1)  dapE(1) dat(3)  ldh(1)  lhkA(3)
skesa-51.fa     lmonocytogenes  1       abcZ(3) bglA(1) cat(1)  dapE(1) dat(3)  ldh(1)  lhkA(3)
skesa-61.fa     lmonocytogenes  1       abcZ(3) bglA(1) cat(1)  dapE(1) dat(3)  ldh(1)  lhkA(3)
skesa-71.fa     lmonocytogenes  1       abcZ(3) bglA(1) cat(1)  dapE(1) dat(3)  ldh(1)  lhkA(3)
skesa-81.fa     lmonocytogenes  1       abcZ(3) bglA(1) cat(1)  dapE(1) dat(3)  ldh(1)  lhkA(3)
skesa-91.fa     lmonocytogenes  1       abcZ(3) bglA(1) cat(1)  dapE(1) dat(3)  ldh(1)  lhkA(3)

lskatz · 2019-01-03T18:46:43Z

Does changing vector_percent help speed up the assembly?

souvorov · 2019-01-03T19:43:54Z

By default SKESA clips adapters/vectors from reads before assembling. --vector_percent 1 disables this step which makes the whole process somewhat faster. It is recommended only if you know that the reads don't have adapters.

lskatz · 2019-01-03T20:39:38Z

@tseemann did you notice any pros/cons of letting SKESA or some other tool trim adapters for you?

tseemann · 2020-02-17T20:51:32Z

@lskatz our Illumina software is setup to remove all adapters, by putting the nextera transposase in the SampleSheet.csv

Phylloxera mentioned this issue Jun 21, 2019

WARNING: iterations are disabled #15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best paramaters for a very fast, rough assembly #11

Best paramaters for a very fast, rough assembly #11

tseemann commented Oct 10, 2018

souvorov commented Oct 11, 2018

tseemann commented Oct 13, 2018 •

edited

Loading

tseemann commented Oct 13, 2018 •

edited

Loading

lskatz commented Jan 3, 2019

souvorov commented Jan 3, 2019

lskatz commented Jan 3, 2019

tseemann commented Feb 17, 2020

Best paramaters for a very fast, rough assembly #11

Best paramaters for a very fast, rough assembly #11

Comments

tseemann commented Oct 10, 2018

souvorov commented Oct 11, 2018

tseemann commented Oct 13, 2018 • edited Loading

tseemann commented Oct 13, 2018 • edited Loading

lskatz commented Jan 3, 2019

souvorov commented Jan 3, 2019

lskatz commented Jan 3, 2019

tseemann commented Feb 17, 2020

tseemann commented Oct 13, 2018 •

edited

Loading

tseemann commented Oct 13, 2018 •

edited

Loading