-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input file shifts inconsistent (Arabidopsis) #197
Comments
Can you generate the PWM on the plus and minus strands separately? |
Dear @panushri25, I have generated the logos for both strands separately. Positive strand Code used to generate logo: samtools view -b /potter/CONTROL.mRp.clN.sorted.bam Chr1 > /out/out.bam
samtools view -b -@8 /out/out.bam | bedtools bamtobed -i stdin | awk -v OFS="\t" '$6=="+"' | bedtools genomecov -bg -5 -i stdin -g /potter/ath.chrom.sizes.txt | bedtools sort -i stdin > /out/pos.bedGraph
bedGraphToBigWig /out/pos.bedGraph /potter/ath.chrom.sizes.txt /out/pos.bw
python build_pwm_from_bigwig.py -i /out/pos.bw -g /potter/ath.fasta -op /out/atac_no_shift_pos -cr Chr1 -c /potter/ath.chrom.sizes.txt Negative strand Code used to generate logo: samtools view -b /potter/CONTROL.mRp.clN.sorted.bam Chr1 > /out/out.bam
samtools view -b -@8 /out/out.bam | bedtools bamtobed -i stdin | awk -v OFS="\t" '$6=="-"' | bedtools genomecov -bg -5 -i stdin -g /potter/ath.chrom.sizes.txt | bedtools sort -i stdin > /out/neg.bedGraph
bedGraphToBigWig /out/neg.bedGraph /potter/ath.chrom.sizes.txt /out/neg.bw
python /scratch/chrombpnet/chrombpnet/helpers/preprocessing/analysis/build_pwm_from_bigwig.py -i /out/neg.bw -g /potter/ath.fasta -op /out/atac_no_shift_neg -cr Chr1 -c /potter/ath.chrom.sizes.txt Does this give you any hint on what might be the issue? |
The strands have some sort of unconventional shifts on them, the ATAC-seq processing pipeline you are using might be introducing these shifts can you verify this and re-run the pipeline so that it does not perform any shifts on the positive and negative reads |
We are using nf-core ATAC-seq to process the raw fastq files. I didn't find any information about shifts in the documentation, so I assumed no shifts were applied. Do you have experience with nf-core ATAC-seq? |
I am trying to train ChromBPNet on Arabidopsis ATAC-seq data, but run into the error
"Input file shifts inconsistent"
as in #153, #174 and #176. Having read through these other issues, I was able to generate the PWM for my BAM file:The raw fastq data was processed using the nf-core ATAC-seq pipeline with Bowtie2 as the chosen aligner. To train ChromBPNet I took the merged BAM file containing the alignments of all replicates (found under
bowtie2/merged_replicate/
among the output of the Nextflow pipeline).My command used for training (I am running on a computing cluster with Apptainer):
apptainer exec --nv -e \ --no-mount /scratch \ --bind data:/data \ --bind bias_model:/bias_model \ --bind potter-kc:/potter \ --bind out:/output \ chrombpnet.sif \ chrombpnet pipeline \ -ibam /potter/CONTROL.mRp.clN.sorted.bam \ -d ATAC \ -g /potter/ath.fasta \ -c /potter/ath.chrom.sizes.txt \ -p /data/peaks_no_blacklist.bed \ -n /data/background_negatives.bed \ -fl /data/train1-3_val4_test5.json \ -b /bias_model/ENCSR868FGK_bias_fold_0.h5 \ -o /output
This worked fine when using the data from the tutorial, but yields the inconsistent shift error on my own data. To my knowledge, the nf-core ATAC-seq pipeline does not apply any shifts, at least I did not find it in the documentation.
The commands I used to generate the above PWM from my BAM file:
Do you have any suggestions on how to proceed? I don't have sufficient background on Tn5 bias to judge if the PWM looks as expected. Are there any references you suggest to learn more about Tn5 bias? I am wondering if the issue could be due to specific bias in Arabidopsis or plants in general, compared to the human genome.
The text was updated successfully, but these errors were encountered: