Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only counting multiple aligned reads for one chromosome with "-l *.bed" and "-q 0" parameters #72

Open
Adglink opened this issue May 16, 2021 · 5 comments

Comments

@Adglink
Copy link

Adglink commented May 16, 2021

I have a small test dataset I've created to measure allelic expression of RNA-seq aligned reads. In one of my test examples, I have two very small chromosomes that are identical and test reads that align to both chromosomes. However, when I use bam-readcount with the parameters "-l *.bed" and "-q 0" to count reads that align to a SNP on both chromosomes, bam-readcount only reports reads for one chromosome.

I've performed the same analysis using "bedtools coverage" and the output reported the same coverage for both chromosomes, as I would expect. However, I love how bam-readcount reports reads for each nucleotide at each position and would prefer to keep using bam-readcount. Please let me know if you have any suggestions or if you would like a copy of my test files.

@Adglink Adglink changed the title Only counting multiple aligned reads for one chromosome with -l and -q 0 parameters Only counting multiple aligned reads for one chromosome with "-l *.bed" and "-q 0" parameters May 16, 2021
@chrisamiller
Copy link
Collaborator

I'm not entirely clear about the details here - could you share a small example bam (even if it's only a handful of reads?) and a command that allows us to reproduce the problem? Thanks!

@Adglink
Copy link
Author

Adglink commented May 17, 2021

Hey Chris, thanks for getting back to me so fast. I really appreciate it.

I went ahead and attached all the files needed to run bam-readcount. The command I use is in the command.txt file.

You'll notice in the .sam.txt file that the paired end reads labeled as "Gene2_WT_1-3" align to both Gene2 and Gene2_1 chromosomes. However, the results found in the bam-readcount_output.txt file only report aligned reads for Gene2_1.

Let me know if you have any other questions or concerns for me.

Bam-readcount_Test.zip

@Adglink
Copy link
Author

Adglink commented May 19, 2021

Hey Chris, I'm sure you have a million other things going on, but I was wondering if I could get an update from your end. I'm I misinterpreting my results somehow or is there an issue in counting multiple aligned reads that needs to be fixed? If there is a bug, how long do you think it would take to fix it?

@chrisamiller
Copy link
Collaborator

Took a quick look at this tonight. Here's a paired end read and it's sam flags:

Gene2_WT_1      99      Gene2_1 191
Gene2_WT_1      147     Gene2_1 191
Gene2_WT_1      355     Gene2   191
Gene2_WT_1      403     Gene2   191

I assume the issue is caused by the fact that the second pair of entries for that read, it is labelled as a non-primary alignment. (see https://broadinstitute.github.io/picard/explain-flags.html)

See this issue, which specifies that they're excluded intentionally

bam-readcount excludes secondary, qc failed and duplicate reads from the resulting counts.

#46 (comment)

There is also an issue dating back several years requesting the ability to use -F syntax to specify flags to include/exclude, but it's not a priority to implement, and if your workflow requires counting at secondary alignments, then bam-readcount might not be the right tool for you right now. Best of luck!

@Adglink
Copy link
Author

Adglink commented May 20, 2021

I didn't realize that bam-readcount intentionally excluded secondary alignments. Thanks for helping me out with this Chris and best of luck to you too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants