You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a BAM file downloaded from the GDC and the corresponding reference fasta (GRCh38.d1.vd1.fa). I have the following assertion which is triggered: Assertion failed: _impl->header_contig_map.count(chr) != 0
After adding log messages, at least one case where it fails (I am not sure how multithreading might change results here) is with the sequence 'HCV-1'. I get the log line Thread 140029280118528 estimating depth for HCV-1
but if I add a warning in BamReader::estimateDepth I see that what is being worked on is 'HCV'. I believe that the problem resides in the separators used in StringUtil::parsePos at line 159.
I should note that HCV-1 and HCV-2 were added to reference_chromosomes map in estimateDepths and to both_chromosomes in the same function.
I do not know what the correct path for correction is, as I expect you had your reasons to include '-' as a separator. I expect that if I modify the fasta and prepare a new header for my Bam in order to substitute the '-' with '_' this would correct the situation. Or would you plan an update to address this?
If I may also take the opportunity to hint at the warning: BAM header only has a subset of the reference chromosomes -- please make sure they match!. With my input, many contigs are not included in the both_chromosomes map. For example, among others, chr1_KI270706v1_random, chr1_KI270707v1_random, HTLV-1, or HPV100. Clearly, disregarding is on purpose by testing include_alt_contig, but then the warning is misleading. (And I am not sure if later on, missing information in the both_chromosomes, might trigger more errors).
Regards,
The text was updated successfully, but these errors were encountered:
Thanks for pointing this out! You're right this is because '-' is a separator. See the function at StringUtil.hh line 156.
To make your pipeline more robust, I would recommend using underscore in chromosome names instead of dash. I am not if other tools have similar issues...
hi,
I have a BAM file downloaded from the GDC and the corresponding reference fasta (GRCh38.d1.vd1.fa). I have the following assertion which is triggered:
Assertion failed: _impl->header_contig_map.count(chr) != 0
After adding log messages, at least one case where it fails (I am not sure how multithreading might change results here) is with the sequence 'HCV-1'. I get the log line
Thread 140029280118528 estimating depth for HCV-1
but if I add a warning in
BamReader::estimateDepth
I see that what is being worked on is 'HCV'. I believe that the problem resides in the separators used inStringUtil::parsePos
at line 159.I should note that HCV-1 and HCV-2 were added to
reference_chromosomes
map in estimateDepths and toboth_chromosomes
in the same function.I do not know what the correct path for correction is, as I expect you had your reasons to include '-' as a separator. I expect that if I modify the fasta and prepare a new header for my Bam in order to substitute the '-' with '_' this would correct the situation. Or would you plan an update to address this?
If I may also take the opportunity to hint at the warning:
BAM header only has a subset of the reference chromosomes -- please make sure they match!
. With my input, many contigs are not included in theboth_chromosomes
map. For example, among others,chr1_KI270706v1_random
,chr1_KI270707v1_random
,HTLV-1
, orHPV100
. Clearly, disregarding is on purpose by testinginclude_alt_contig
, but then the warning is misleading. (And I am not sure if later on, missing information in the both_chromosomes, might trigger more errors).Regards,
The text was updated successfully, but these errors were encountered: