Skip to content

Mapping strategy by Haopu YANG

HYoung1698 edited this page May 12, 2018 · 7 revisions

UPDATE May 12: Found big prob. The counting after collapse does not match with uncollapsed ones. Very disappointing and not able to present the result this week. Current result can be found here.


Different strategy might be adopt when mapping, considering the inflating affect by collapse, and other (potential) factors such as mapping order etc. Now I shall try to test whether different strategies have significant differences and compare the result with some publications. The script is uploaded here. Notation: 'I' means collapsed prior to mapping, 'II' for uncollapsed; A for mapping in order given in [this page]; B for mapping miRNA first, mid-range RNAs(incl srpRNA, tRNA, Y RNA, piRNA) second and long RNA(incl mRNA, lncRNA and TUCP) last; C means ... And I plan to do a cross test to check which one has better feature.


#Current pwd ~/projects/comparison
#qc first time
for i in `ls 00.rawdata/ | egrep -v 'zip|html'`;
do echo $i;
fastqc -o 00.rawdata/ 00.rawdata/$i ;
done
mkdir 01.preprocess
mkdir 02.mapping
mkdir log
for i in `ls 00.rawdata/ | egrep -v 'zip|html'`; 
do j=${i%.*}; 
echo $j; 
cutadapt -u -100 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -m 16 --trim-n --too-short-output=01.preprocess/$j.tooShort.fastq -o 01.preprocess/$j.cutadapt.fastq 00.rawdata/$j.fastq >> log/$j.cutadapt.log 2>> log/$j.cutadapt.err ; fastqc -o 01.preprocess/ 01.preprocess/$j.cutadapt.fastq ; 
done
###

for i in `ls 00.rawdata | egrep -v 'zip|html'`;
> do j=${i%.*};
> echo $j;
> mkdir 02.mapping/$j 02.mapping/$j/no_rRNA;
> bowtie2 -p 4 --sensitive-local --no-unal --un 02.mapping/$j/no_rRNA.fastq -x src/rRNA_exon 01.preprocess/$j.cutadapt.fastq -S 02.mapping/$j/no_rRNA/$j.rRNA_exon.sam > log/$j.rm_rRNA.log 2> log/$j.rm_rRNA.err ; 
samtools view -b -f 16 02.mapping/$j/no_rRNA/$j.rRNA_exon.sam > 02.mapping/$j/no_rRNA/$j.rRNA_exon.reverseMap.bam;
samtools fastq 02.mapping/$j/no_rRNA/$j.rRNA_exon.reverseMap.bam > 02.mapping/$j/no_rRNA/$j.rRNA_exon.reverseMap.fastq;
cat 02.mapping/$j/no_rRNA/$j.no_rRNA.fastq 02.mapping/$j/no_rRNA/$j.rRNA_exon.reverseMap.fastq > 02.mapping/$j/no_rRNA/$j.rRNA_exon.unmapped.fastq;
> done


###make midSize mapping

for j in AfterSurgery_1 AfterSurgery_2 AfterSurgery_3 BeforeSurgery_1 BeforeSurgery_2 BeforeSurgery_3 NC_1 NC_2 NC_3; do echo $j; samtools view -b -F 16 02.mappingnew/$j/midSizeRNA.sam > 02.mappingnew/$j/midSizeRNA/$j.midSizeRNA.clean.bam; done
for j in AfterSurgery_1 AfterSurgery_2 AfterSurgery_3 BeforeSurgery_1 BeforeSurgery_2 BeforeSurgery_3 NC_1 NC_2 NC_3; do echo $j;  rsem-tbam2gbam src/midSizeRNA 02.mappingnew/$j/midSizeRNA/$j.midSizeRNA.clean.bam  02.mappingnew/$j/midSizeRNA/$j.midSizeRNA.rsem.clean.bam ; echo "complete" $i; done
for j in AfterSurgery_1 AfterSurgery_2 AfterSurgery_3 BeforeSurgery_1 BeforeSurgery_2 BeforeSurgery_3 NC_1 NC_2 NC_3; do echo $j;  samtools fastq 02.mappingnew/$j/midSizeRNA/$j.midSizeRNA.reverseMap.bam > 02.mappingnew/$j/midSizeRNA/$j.midSizeRNA.reverseMap.fastq; cat 02.mappingnew/$j/midSizeRNA/$j.unAligned.fastq 02.mappingnew/$j/midSizeRNA/$j.midSizeRNA.reverseMap.fastq > 02.mappingnew/$j/midSizeRNA/$j.midSizeRNA.unmapped.fastq; echo "complete" $j; done
##make largeSize mapping is similar. Just replace mid with large. 

###Collapsing

for j in AfterSurgery_1 AfterSurgery_2 AfterSurgery_3 BeforeSurgery_1 BeforeSurgery_2 BeforeSurgery_3 NC_1 NC_2 NC_3; do echo $j;
for l in 16 18 20 36;
do echo $l;
cutadapt -u -100 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -m $l --trim-n --too-short-output=01.preprocess/$j.L${l}.tooShort.fastq -o 01.preprocess//$j.cutadapt.L${l}.fastq 00.rawdata/$j.fastq;
echo "complete" $j;
fastx_collapser -i 01.preprocess/$j.cutadapt.L${l}.fastq -o 01.preprocess/$j.cutadapt.L${l}.collapsed.fastq;
echo "Collapsed" $j; 
done;
done

##map to miRNA after collapsing
for i in `ls 00.rawdata | egrep -v 'zip|html'`;
do j=${i%.*};
echo $j;
for l in 16 18 20 36;

do echo $l;

echo "start map to miRNA" $j $l;
bowtie2 -p 4 --sensitive-local --no-unal --un 02.mappingnew/$j/miRNA/$j.cutadapt.L${l}.unAligned.fastq -x src/miRNA 02.mappingnew/$j/no_rRNA/$j.cutadapt.L${l}.fastq -S 02.mappingnew/$j/no_rRNA/$j.cutadapt.L${l}.rRNA_exon.sam; 
samtools view -b -f 16 02.mappingnew/$j/no_rRNA/$j.cutadapt.L${l}.rRNA_exonsam > 02.mappingnew/$j/no_rRNA/$j.cutadapt.L${l}.rRNA_exon.reverseMap.bam;
samtools fastq 02.mappingnew/$j/no_rRNA/$j.cutadapt.L${l}.rRNA_exon.reverseMap.bam > 02.mapping/$j/no_rRNA/$j.cutadapt.L${l}.rRNA_exon.reverseMap.fastq;
cat 02.mapping/$j/no_rRNA/$j.cutadapt.L${l}.no_rRNA.fastq 02.mapping/$j/no_rRNA/$j.cutadapt.L${l}.rRNA_exon.reverseMap.fastq > 02.mapping/$j/no_rRNA/$j.cutadapt.L${l}.rRNA_exon.unmapped.fastq;
done;
done

##After collapse map to large/midSizeRNA
for i in `ls 00.rawdata | egrep -v 'zip|html'`;

do j=${i%.*};

echo $j;

for l in 16 18 20 36;

do echo $l;


echo "start map to largeSizeRNA" $j $l;

bowtie2 -p 4 --sensitive-local --un 02.mappingnew/$j/largeSizeRNA/$j.cutadapt.L${l}.unAligned.fastq -x src/largeSizeRNA 02.mappingnew/$j/miRNA/$j.L${l}.unAligned.fq -S 02.mappingnew/$j/largeSizeRNA/$j.cutadapt.L${l}.largeSizeRNA.sam;
samtools view -b -f 16 02.mappingnew/$j/largeSizeRNA/$j.cutadapt.L${l}.largeSizeRNA.sam > 02.mappingnew/$j/largeSizeRNA/$j.cutadapt.L${l}.largeSizeRNA.reverseMap.bam;
samtools view -b -F 16 02.mappingnew/$j/largeSizeRNA/$j.cutadapt.L${l}.largeSizeRNA.sam > 02.mappingnew/$j/largeSizeRNA/$j.cutadapt.L${l}.largeSizeRNA.clean.bam;
rsem-tbam2gbam src/largeSizeRNA 02.mappingnew/$j/largeSizeRNA/$j.cutadapt.L${l}.largeSizeRNA.clean.bam 02.mappingnew/$j/largeSizeRNA/$j.cutadapt.L${l}.largeSizeRNA.rsem.clean.bam;
samtools fastq 02.mappingnew/$j/largeSizeRNA/$j.cutadapt.L${l}.largeSizeRNA.reverseMap.bam > 02.mappingnew/$j/largeSizeRNA/$j.cutadapt.L${l}.largeSizeRNA.reverseMap.fastq;
cat 02.mappingnew/$j/largeSizeRNA/$j.cutadapt.L${l}.unAligned.fastq 02.mappingnew/$j/largeSizeRNA/$j.cutadapt.L${l}.largeSizeRNA.reverseMap.fastq > 02.mappingnew/$j/largeSizeRNA/$j.cutadapt.L${l}.largeSizeRNA.unmapped.fastq;
done;
done

Then do the counting.

I x A

The result has been uploaded.

II x A

I x B

II x B

I x C

II x C

Conclusion