Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reason for low coverage output for deepseq data #85

Open
YashaNazirB opened this issue Sep 25, 2021 · 5 comments
Open

Reason for low coverage output for deepseq data #85

YashaNazirB opened this issue Sep 25, 2021 · 5 comments

Comments

@YashaNazirB
Copy link

Hi

I am running the following code for my deepseq data analysis to extract BAM readcounts. Although my expected coverage is somewhat between 100,000 to 350,000 however, after I run the following code, my output shows very low coverage for deepseq genomic positions which is around 9000 max. Why am I getting this low coverage? I even defined the -d to be 1000000000.

bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam |awk -F ":|\t|=" 'BEGIN {OFS = "\t"}; {print $1, $2, $3 , $4, $21 , $35, $49 , $63}' > BRC_UNGKO_stim.txt

@chrisamiller
Copy link
Collaborator

According to this issue, specifying a region (and reference) is a workaround for this problem: #3 (Counting every base in the entire reference is going to be somewhat slow anyway, so splitting may sometimes be a good idea)

What happens if you run bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam chr1:1-999999999 (or actually substitute the length of the chr)

@YashaNazirB
Copy link
Author

YashaNazirB commented Sep 26, 2021 via email

@chrisamiller
Copy link
Collaborator

Do you have a small example bam file that you can post, along with a command that reproduces the problem?

@YashaNazirB
Copy link
Author

Commands that I tried are as follows:

bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam chr12:114663740-114595637 |awk -F ":|\t|=" 'BEGIN {OFS = "\t"}; {print $1, $2, $3 , $4, $21 , $35, $49 , $63}' > BRC_UNGKO_stimtest.txt

bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam chr12:114663740-114595637 > BRC_UNGKO_stimtest1.txt

Following is the link to the bam file:
https://drive.google.com/file/d/1X4qbqChdAIXDXtcKnbUOHJ44sGmBdePP/view?usp=sharing

I am using mouse mm9 genome as the reference and the coordinates of interest for deepseq analysis are chr12:114663740-114595637.

Best regards,
Yasha

@chrisamiller
Copy link
Collaborator

Okay, I can reproduce this error. Thanks for the bug report - we'll look into it.

$ samtools index aligned_sorted_UNGKO_stim.bam

$ docker run -v /tmp:/tmp -v /Users/cmiller/annotations:/annotations --entrypoint /bin/bash -it mgibio/bam-readcount

$ bam-readcount -w 0 -f /annotations/tmp.fa aligned_sorted_UNGKO_stim.bam chr1:3603112-3603112
Minimum mapping quality is set to 0
chr1	3603112	c	10	=:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00	A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00	C:9:0.00:35.33:0.00:9:0:0.00:0.01:40.67:9:0.99:150.00:0.99	G:1:0.00:32.00:0.00:1:0:0.00:0.02:106.00:1:0.99:150.00:0.99	T:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00	N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00

# bam-readcount -w 0 -f /annotations/tmp.fa aligned_sorted_UNGKO_stim.bam chr12:114663740-114595637  >out.rct
Minimum mapping quality is set to 0
[E::sam_itr_next] Null iterator

@apldx - can you take a look and see what's up? FWIW, the tmp.fa in there is just mm9 with "chr" prefixes added. I dropped bam and fasta to use on the local filesystem here for debugging purposes: /storage1/fs1/timley/Active/aml_ppg/analysis/tmp/bam_readcount/issue_85/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants