Reason for low coverage output for deepseq data #85

YashaNazirB · 2021-09-25T00:50:50Z

Hi

I am running the following code for my deepseq data analysis to extract BAM readcounts. Although my expected coverage is somewhat between 100,000 to 350,000 however, after I run the following code, my output shows very low coverage for deepseq genomic positions which is around 9000 max. Why am I getting this low coverage? I even defined the -d to be 1000000000.

bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam |awk -F ":|\t|=" 'BEGIN {OFS = "\t"}; {print $1, $2, $3 , $4, $21 , $35, $49 , $63}' > BRC_UNGKO_stim.txt

chrisamiller · 2021-09-25T04:44:45Z

According to this issue, specifying a region (and reference) is a workaround for this problem: #3 (Counting every base in the entire reference is going to be somewhat slow anyway, so splitting may sometimes be a good idea)

What happens if you run bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam chr1:1-999999999 (or actually substitute the length of the chr)

YashaNazirB · 2021-09-26T22:22:23Z

Hey Chris, I tried selecting a defined region but it did not generate any output. Output was 0 bytes. How should I proceed?

…

-Yasha

________________________________ From: Chris Miller ***@***.***> Sent: Saturday, September 25, 2021 12:44 AM To: genome/bam-readcount ***@***.***> Cc: Yasha Nazir Butt ***@***.***>; Author ***@***.***> Subject: Re: [genome/bam-readcount] Reason for low coverage output for deepseq data (#85) [EXTERNAL] According to this issue, specifying a region (and reference) is a workaround for this problem: #3<#3> (Counting every base in the entire reference is going to be somewhat slow anyway, so splitting may sometimes be a good idea) What happens if you run bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam chr1:1-999999999 (or actually substitute the length of the chr) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#85 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AM3AU2XQP2OXQZ5WH5KROI3UDVHUPANCNFSM5EW7UH7A>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

chrisamiller · 2021-09-26T22:24:17Z

Do you have a small example bam file that you can post, along with a command that reproduces the problem?

YashaNazirB · 2021-09-26T23:28:33Z

Commands that I tried are as follows:

bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam chr12:114663740-114595637 |awk -F ":|\t|=" 'BEGIN {OFS = "\t"}; {print $1, $2, $3 , $4, $21 , $35, $49 , $63}' > BRC_UNGKO_stimtest.txt

bam-readcount -w0 -f mm9.fa -d1000000000 aligned_sorted_UNGKO_stim.bam chr12:114663740-114595637 > BRC_UNGKO_stimtest1.txt

Following is the link to the bam file:
https://drive.google.com/file/d/1X4qbqChdAIXDXtcKnbUOHJ44sGmBdePP/view?usp=sharing

I am using mouse mm9 genome as the reference and the coordinates of interest for deepseq analysis are chr12:114663740-114595637.

Best regards,
Yasha

chrisamiller · 2021-09-27T02:59:48Z

Okay, I can reproduce this error. Thanks for the bug report - we'll look into it.

$ samtools index aligned_sorted_UNGKO_stim.bam

$ docker run -v /tmp:/tmp -v /Users/cmiller/annotations:/annotations --entrypoint /bin/bash -it mgibio/bam-readcount

$ bam-readcount -w 0 -f /annotations/tmp.fa aligned_sorted_UNGKO_stim.bam chr1:3603112-3603112
Minimum mapping quality is set to 0
chr1	3603112	c	10	=:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00	A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00	C:9:0.00:35.33:0.00:9:0:0.00:0.01:40.67:9:0.99:150.00:0.99	G:1:0.00:32.00:0.00:1:0:0.00:0.02:106.00:1:0.99:150.00:0.99	T:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00	N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00

# bam-readcount -w 0 -f /annotations/tmp.fa aligned_sorted_UNGKO_stim.bam chr12:114663740-114595637  >out.rct
Minimum mapping quality is set to 0
[E::sam_itr_next] Null iterator

@apldx - can you take a look and see what's up? FWIW, the tmp.fa in there is just mm9 with "chr" prefixes added. I dropped bam and fasta to use on the local filesystem here for debugging purposes: /storage1/fs1/timley/Active/aml_ppg/analysis/tmp/bam_readcount/issue_85/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reason for low coverage output for deepseq data #85

Reason for low coverage output for deepseq data #85

YashaNazirB commented Sep 25, 2021

chrisamiller commented Sep 25, 2021

YashaNazirB commented Sep 26, 2021 via email

chrisamiller commented Sep 26, 2021

YashaNazirB commented Sep 26, 2021

chrisamiller commented Sep 27, 2021

Reason for low coverage output for deepseq data #85

Reason for low coverage output for deepseq data #85

Comments

YashaNazirB commented Sep 25, 2021

chrisamiller commented Sep 25, 2021

YashaNazirB commented Sep 26, 2021 via email

chrisamiller commented Sep 26, 2021

YashaNazirB commented Sep 26, 2021

chrisamiller commented Sep 27, 2021