-
Dear rnaSPAdes developers, I have a question related to the coverage of hybrid transcriptome assemblies. We used rnaSPAdes to assemble a hybrid transcriptome from Illumina total RNA-seq short reads (2 x 75 bp, stranded) and PacBio IsoSeq full-length cDNA reads with the option --ss-rf (v3.14.0; I assume that is the same algorithm described in Prjibelski et al., 2020?) When uploading the hybrid assembly at the DDBJ/EMBL/GenBank Transcriptome Shotgun Assembly Sequence Database (TSA), there is an optional "coverage" field. I realized that coverage values are given in the "NODE*" fasta headers produced by rnaSPAdes. How is this coverage value in the fasta header calculated? Do you think I can take the mean of all coverage values in the header, to report on an overall coverage for the hybrid transcriptome assembly? Or should I rather re-map the Illumina and PacBio reads to the hybrid transcriptome assembly, to assess the coverage? I would highly appreciate any thoughts about this. Keep up the good work, Martin |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hello Please refer to SPAdes manual: https://cab.spbu.ru/files/release3.15.0/manual.html#spadesoutsec for description of SPAdes output. But in short, the coverage reported is the k-mer coverage for last k-mer used. The single coverage for transcriptome dataset does not make much sense for me. So, you'd better ask them on this matter :) |
Beta Was this translation helpful? Give feedback.
-
Hi, |
Beta Was this translation helpful? Give feedback.
Hello
Please refer to SPAdes manual: https://cab.spbu.ru/files/release3.15.0/manual.html#spadesoutsec for description of SPAdes output. But in short, the coverage reported is the k-mer coverage for last k-mer used.
The single coverage for transcriptome dataset does not make much sense for me. So, you'd better ask them on this matter :)