-
Notifications
You must be signed in to change notification settings - Fork 11
Benchmark for 10% sample and mapping by Haopu YANG
In order to save time, we can subtract a part of the .fastq file in order to test whether the whole pipeline work. Now I am going to substract the first
#!/bin/bash
#Current working directory should be ~/projects/exRNA/hcc_examples
mkdir subset
#Get file
for i in AfterSurgery_1 AfterSurgery_2 AfterSurgery_3 BeforeSurgery_1 BeforeSurgery_2 BeforeSurgery_3 NC_1 NC_2 NC_3;
do
echo "Sub ${i}.fastq"
totallines=$(wc -l /home/yanghaopu/projects/exRNA/hcc_example/00.rawdata/${i}.fastq | awk '{print $1}')
sublines=$(expr $totallines - $totallines % 40)
sublines=$(expr $sublines / 10)
echo "Before cut LINES = $totallines , after cut LINES = $sublines"
time head -n $sublines /home/yanghaopu/projects/exRNA/hcc_example/00.rawdata/${i}.fastq > /home/yanghaopu/projects/SmallSubset/${i}.fastq
done
I did this by modifying this instruction and my bash code, which is quite bulky and easy, will be uploaded later. Codes are mostly like this
#cutadapt fisrt.
mkdir ~/projects/01.preprocess
for i in `ls /home/yanghaopu/projects/00.rawdata | egrep -v 'zip|html'`;
do j=${i%.*};
echo $j;
cutadapt -u -100 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -m 16 --trim-n --too-short-output=/home/yanghaopu/projects/01.preprocess/${j}.tooShort.fastq -o ~/projects/01.preprocess/${j}.cutadapt.fastq ~/projects/00.rawdata/${j}.fastq >> ~/projects/log/${j}.cutadapt.log 2>>~/projects/log/${j}.cutadapt.err;
done
#remove rRNA
for i in `ls 00.rawdata | egrep -v 'zip|html'`;
do j=${i%.*};
echo $j;
mkdir 02.mapping/$j 02.mapping/$j/no_rRNA
bowtie2 -p 4 --sensitive-local --no-unal --un 02.mapping/$j/no_rRNA/$j.no_rRNA.fq -x src/rRNA_exon 01.preprocess/$j.cutadapt.fastq -S 02.mapping/$j/no_rRNA/$j.rRNA_exon.sam > log/$j.rm_rRNA.log 2> log/$j.rm_rRNA.err
samtools view -b -f 16 02.mapping/$j/no_rRNA/$j.rRNA_exon.sam > 02.mapping/$j/no_rRNA/$j.rRNA_exon.reverseMap.bam
samtools view -b -F 16 02.mapping/$j/no_rRNA/$j.rRNA_exon.sam > 02.mapping/$j/no_rRNA/$j.rRNA_exon.clean.bam
samtools fastq 02.mapping/$j/no_rRNA/$j.rRNA_exon.reverseMap.bam > 02.mapping/$j/no_rRNA/$j.rRNA_exon.reverseMap.fastq
cat 02.mapping/$j/no_rRNA/$j.no_rRNA.fq 02.mapping/$j/no_rRNA/$j.rRNA_exon.reverseMap.fastq > 02.mapping/$j/no_rRNA/$j.rRNA_exon.unmapped.fastq
done;
#map to human genome
for i in `ls 00.rawdata | egrep -v 'zip|html'`;
do j=${i%.*};
echo $j;
mkdir 02.mapping/$j/hg38
bowtie2 -p 4 --sensitive-local --no-unal --un 02.mapping/$j/hg38/$j.unAligned.fq -x /BioII/lulab_b/shared/genomes/human_hg38/index/bowtie2_hg38_index/GRCh38_p10 02.mapping/$j/no_rRNA/$j.rRNA_exon.unmapped.fastq -S 02.mapping/$j/hg38/$j.hg38.sam > log/$j.map2hg38.log 2> log/$j.map2hg38.err
samtools view -S -b 02.mapping/$j/hg38/$j.hg38.sam > 02.mapping/$j/hg38/$j.hg38.clean.bam
ln -s 02.mapping/$j/hg38/$j.unAligned.fq 02.mapping/$j/hg38/$j.hg38.unmapped.fastq
done;
## sequential map to each RNA type
## rRNA_exon miRNA piRNA Y_RNA_exon snRNA srpRNA_exon tRNA lncRNA_exon mRNA_exon
for i in `ls 00.rawdata | egrep -v 'zip|html'`;
do j=${i%.*};
echo $j;
# map to miRNA
mkdir 02.mapping/$j/miRNA
echo "start map to miRNA" >> log/$j.miRNA.log 2>> log/$j.miRNA.err
bowtie2 -p 4 --sensitive-local --no-unal --un 02.mapping/$j/miRNA/$j.unAligned.fq -x src/miRNA 02.mapping/$j/no_rRNA/$j.rRNA_exon.unmapped.fastq -S 02.mapping/$j/miRNA/$j.miRNA.sam > log/$j.miRNA.log 2> log/$j.miRNA.err
samtools view -b -f 16 02.mapping/$j/miRNA/$j.miRNA.sam > 02.mapping/$j/miRNA/$j.miRNA.reverseMap.bam
samtools view -b -F 16 02.mapping/$j/miRNA/$j.miRNA.sam > 02.mapping/$j/miRNA/$j.miRNA.clean.bam
rsem-tbam2gbam src/miRNA 02.mapping/$j/miRNA/$j.miRNA.clean.bam 02.mapping/$j/miRNA/$j.miRNA.rsem.clean.bam > log/$j.miRNA.tbam2gbam.log
samtools fastq 02.mapping/$j/miRNA/$j.miRNA.reverseMap.bam > 02.mapping/$j/miRNA/$j.miRNA.reverseMap.fastq
cat 02.mapping/$j/miRNA/$j.unAligned.fq 02.mapping/$j/miRNA/$j.miRNA.reverseMap.fastq > 02.mapping/$j/miRNA/$j.miRNA.unmapped.fastq
# map to piRNA
mkdir 02.mapping/$j/piRNA
echo "start map to piRNA" >> log/$j.piRNA.log 2>> log/$j.piRNA.err
bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/piRNA/$j.unAligned.fq -x src/piRNA 02.mapping/$j/miRNA/$j.miRNA.unmapped.fastq -S 02.mapping/$j/piRNA/$j.piRNA.sam > log/$j.piRNA.log 2> log/$j.piRNA.err
samtools view -b -f 16 02.mapping/$j/piRNA/$j.piRNA.sam > 02.mapping/$j/piRNA/$j.piRNA.reverseMap.bam
samtools view -b -F 16 02.mapping/$j/piRNA/$j.piRNA.sam > 02.mapping/$j/piRNA/$j.piRNA.clean.bam
rsem-tbam2gbam src/piRNA 02.mapping/$j/piRNA/$j.piRNA.clean.bam 02.mapping/$j/piRNA/$j.piRNA.rsem.clean.bam > log/$j.piRNA.tbam2gbam.log
samtools fastq 02.mapping/$j/piRNA/$j.piRNA.reverseMap.bam > 02.mapping/$j/piRNA/$j.piRNA.reverseMap.fastq
cat 02.mapping/$j/piRNA/$j.unAligned.fq 02.mapping/$j/piRNA/$j.piRNA.reverseMap.fastq > 02.mapping/$j/piRNA/$j.piRNA.unmapped.fastq
# map to Y_RNA
mkdir 02.mapping/$j/Y_RNA_exon
echo "start map to Y_RNA_exon" >> log/$j.Y_RNA_exon.log 2>> log/$j.Y_RNA_exon.err
bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/Y_RNA_exon/$j.unAligned.fq -x src/Y_RNA_exon 02.mapping/$j/piRNA/$j.piRNA.unmapped.fastq -S 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.sam > log/$j.Y_RNA_exon.log 2> log/$j.Y_RNA_exon.err
samtools view -b -f 16 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.sam > 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.reverseMap.bam
samtools view -b -F 16 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.sam > 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.clean.bam
rsem-tbam2gbam src/Y_RNA_exon 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.clean.bam 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.rsem.clean.bam > log/$j.Y_RNA_exon.tbam2gbam.log
samtools fastq 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.reverseMap.bam > 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.reverseMap.fastq
cat 02.mapping/$j/Y_RNA_exon/$j.unAligned.fq 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.reverseMap.fastq > 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.unmapped.fastq
# map to snRNA
mkdir 02.mapping/$j/snRNA
echo "start map to snRNA" >> log/$j.snRNA.log 2>> log/$j.snRNA.err
bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/snRNA/$j.unAligned.fq -x src/snRNA 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.unmapped.fastq -S 02.mapping/$j/snRNA/$j.snRNA.sam > log/$j.snRNA.log 2> log/$j.snRNA.err
samtools view -b -f 16 02.mapping/$j/snRNA/$j.snRNA.sam > 02.mapping/$j/snRNA/$j.snRNA.reverseMap.bam
samtools view -b -F 16 02.mapping/$j/snRNA/$j.snRNA.sam > 02.mapping/$j/snRNA/$j.snRNA.clean.bam
rsem-tbam2gbam src/snRNA 02.mapping/$j/snRNA/$j.snRNA.clean.bam 02.mapping/$j/snRNA/$j.snRNA.rsem.clean.bam > log/$j.snRNA.tbam2gbam.log
samtools fastq 02.mapping/$j/snRNA/$j.snRNA.reverseMap.bam > 02.mapping/$j/snRNA/$j.snRNA.reverseMap.fastq
cat 02.mapping/$j/snRNA/$j.unAligned.fq 02.mapping/$j/snRNA/$j.snRNA.reverseMap.fastq > 02.mapping/$j/snRNA/$j.snRNA.unmapped.fastq
# map to srpRNA
mkdir 02.mapping/$j/srpRNA_exon
echo "start map to srpRNA_exon" >> log/$j.srpRNA_exon.log 2>> log/$j.srpRNA_exon.err
bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/srpRNA_exon/$j.unAligned.fq -x src/srpRNA_exon 02.mapping/$j/snRNA/$j.snRNA.unmapped.fastq -S 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.sam > log/$j.srpRNA_exon.log 2> log/$j.srpRNA_exon.err
samtools view -b -f 16 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.sam > 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.reverseMap.bam
samtools view -b -F 16 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.sam > 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.clean.bam
rsem-tbam2gbam src/srpRNA_exon 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.clean.bam 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.rsem.clean.bam > log/$j.srpRNA_exon.tbam2gbam.log
samtools fastq 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.reverseMap.bam > 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.reverseMap.fastq
cat 02.mapping/$j/srpRNA_exon/$j.unAligned.fq 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.reverseMap.fastq > 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.unmapped.fastq
# map to tRNA
mkdir 02.mapping/$j/tRNA
echo "start map to tRNA" >> log/$j.tRNA.log 2>> log/$j.tRNA.err
bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/tRNA/$j.unAligned.fq -x src/tRNA 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.unmapped.fastq -S 02.mapping/$j/tRNA/$j.tRNA.sam > log/$j.tRNA.log 2> log/$j.tRNA.err
samtools view -b -f 16 02.mapping/$j/tRNA/$j.tRNA.sam > 02.mapping/$j/tRNA/$j.tRNA.reverseMap.bam
samtools view -b -F 16 02.mapping/$j/tRNA/$j.tRNA.sam > 02.mapping/$j/tRNA/$j.tRNA.clean.bam
rsem-tbam2gbam src/tRNA 02.mapping/$j/tRNA/$j.tRNA.clean.bam 02.mapping/$j/tRNA/$j.tRNA.rsem.clean.bam > log/$j.tRNA.tbam2gbam.log
samtools fastq 02.mapping/$j/tRNA/$j.tRNA.reverseMap.bam > 02.mapping/$j/tRNA/$j.tRNA.reverseMap.fastq
cat 02.mapping/$j/tRNA/$j.unAligned.fq 02.mapping/$j/tRNA/$j.tRNA.reverseMap.fastq > 02.mapping/$j/tRNA/$j.tRNA.unmapped.fastq
# map to lncRNA
mkdir 02.mapping/$j/lncRNA_exon
echo "start map to lncRNA_exon" > log/$j.lncRNA_exon.log 2> log/$j.lncRNA_exon.err
bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/lncRNA_exon/$j.unAligned.fq -x src/lncRNA_exon 02.mapping/$j/tRNA/$j.tRNA.unmapped.fastq -S 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.sam > log/$j.lncRNA_exon.log 2> log/$j.lncRNA_exon.err
samtools view -b -f 16 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.sam > 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.reverseMap.bam
samtools view -b -F 16 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.sam > 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.clean.bam
rsem-tbam2gbam src/lncRNA_exon 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.clean.bam 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.rsem.clean.bam > log/$j.lncRNA_exon.tbam2gbam.log
samtools fastq 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.reverseMap.bam > 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.reverseMap.fastq
cat 02.mapping/$j/lncRNA_exon/$j.unAligned.fq 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.reverseMap.fastq > 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.unmapped.fastq
# map to mRNA
mkdir 02.mapping/$j/mRNA_exon
echo "start map to mRNA_exon" >> log/$j.mRNA_exon.log 2>> log/$j.mRNA_exon.err
bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/mRNA_exon/$j.unAligned.fq -x src/mRNA_exon 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.unmapped.fastq -S 02.mapping/$j/mRNA_exon/$j.mRNA_exon.sam > log/$j.mRNA_exon.log 2> log/$j.mRNA_exon.err
samtools view -b -f 16 02.mapping/$j/mRNA_exon/$j.mRNA_exon.sam > 02.mapping/$j/mRNA_exon/$j.mRNA_exon.reverseMap.bam
samtools view -b -F 16 02.mapping/$j/mRNA_exon/$j.mRNA_exon.sam > 02.mapping/$j/mRNA_exon/$j.mRNA_exon.clean.bam
rsem-tbam2gbam src/mRNA_exon 02.mapping/$j/mRNA_exon/$j.mRNA_exon.clean.bam 02.mapping/$j/mRNA_exon/$j.mRNA_exon.rsem.clean.bam > log/$j.mRNA_exon.tbam2gbam.log
samtools fastq 02.mapping/$j/mRNA_exon/$j.mRNA_exon.reverseMap.bam > 02.mapping/$j/mRNA_exon/$j.mRNA_exon.reverseMap.fastq
cat 02.mapping/$j/mRNA_exon/$j.unAligned.fq 02.mapping/$j/mRNA_exon/$j.mRNA_exon.reverseMap.fastq > 02.mapping/$j/mRNA_exon/$j.mRNA_exon.unmapped.fastq
# map to genome hg38other region
mkdir 02.mapping/$j/hg38other
echo "start map to hg38other" >> log/$j.hg38other.log 2>> log/$j.hg38other.err
bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/hg38other/$j.unAligned.fq -x /BioII/lulab_b/shared/genomes/human_hg38/index/bowtie2_hg38_index/GRCh38_p10 02.mapping/$j/mRNA_exon/$j.mRNA_exon.unmapped.fastq -S 02.mapping/$j/hg38other/$j.hg38other.sam > log/$j.hg38other.log 2> log/$j.hg38other.err
samtools view -S -b 02.mapping/$j/hg38other/$j.hg38other.sam > 02.mapping/$j/hg38other/$j.hg38other.clean.bam
ln -s 02.mapping/$j/hg38other/$j.unAligned.fq 02.mapping/$j/hg38other/$j.hg38other.unmapped.fastq
done;
## summary the results and statistics
mkdir stat
## number of mapped readsfor each RNA types
for i in `ls 00.rawdata | egrep -v 'zip|html'`;
do j=${i%.*};
echo $j;
# total library size
libSizeN=`cat log/$j.cutadapt.log | grep 'Total reads processed' | awk 'BEGIN{FS=OFS=":"}{print $2}' | sed 's/^ *//g' | sed 's/,//g'`
echo -e "$j\tpreprocess\tlibSizeN\t$libSizeN" >> stat/$j.readsN.stat.tsv
# too short
tooShortN=`cat log/$j.cutadapt.log | grep 'Reads that were too short' | awk 'BEGIN{FS=OFS=":"}{print $2}' | sed 's/^ *//g' | sed 's/,//g' | cut -d ' ' -f 1`
echo -e "$j\tpreprocess\ttooShortN\t$tooShortN" >> stat/$j.readsN.stat.tsv
# clean reads
cleanN=`cat log/$j.cutadapt.log | grep 'Reads written' | awk 'BEGIN{FS=OFS=":"}{print $2}' | sed 's/^ *//g' | sed 's/,//g' | cut -d ' ' -f 1`
echo -e "$j\tpreprocess\tcleanN\t$cleanN" >> stat/$j.readsN.stat.tsv
# rRNA mapped reads
rRNA_N=`samtools flagstat 02.mapping/$j/no_rRNA/$j.rRNA_exon.clean.bam | awk 'NR==5{print $1}'`
echo -e "$j\tpreprocess\trRNA_N\t$rRNA_N" >> stat/$j.readsN.stat.tsv
# kept reads
keptN=`grep '@' 02.mapping/$j/no_rRNA/$j.rRNA_exon.unmapped.fastq | wc -l`
echo -e "$j\tpreprocess\tkeptN\t$keptN" >> stat/$j.readsN.stat.tsv
# map to hg38
hg38_N=`samtools flagstat 02.mapping/$j/hg38/$j.hg38.clean.bam | awk 'NR==5{print $1}'`
echo -e "$j\tmap2hg38\thg38\t$hg38_N" >> stat/$j.readsN.stat.tsv
# map to different RNA types
for k in miRNA piRNA Y_RNA_exon snRNA srpRNA_exon tRNA lncRNA_exon mRNA_exon;
do echo $k;
RNAtypes_N=`samtools flagstat 02.mapping/$j/$k/$j.$k.rsem.clean.bam | awk 'NR==5{print $1}'`
echo -e "$j\tsequentialMap\t$k\t$RNAtypes_N" >> stat/$j.readsN.stat.tsv
done;
# map to hg38 other region
hg38other_N=`samtools flagstat 02.mapping/$j/hg38other/$j.hg38other.clean.bam | awk 'NR==5{print $1}'`
echo -e "$j\tmap2hg38other\thg38other\t$hg38other_N" >> stat/$j.readsN.stat.tsv
# non-human
nonHuman_N=`grep '@' 02.mapping/$j/hg38other/$j.hg38other.unmapped.fastq | wc -l`
echo -e "$j\tmap2hg38other\tnonHuman_N\t$nonHuman_N" >> stat/$j.readsN.stat.tsv
done;
## length of mapped reads for each RNA types
for i in `ls 00.rawdata | egrep -v 'zip|html'`;
do j=${i%.*};
echo $j;
for k in miRNA piRNA Y_RNA_exon snRNA srpRNA_exon tRNA lncRNA_exon mRNA_exon;
do echo $k;
samtools view 02.mapping/$j/$k/$j.$k.rsem.clean.bam | awk 'BEGIN{FS=OFS="\t"}{print length($10)}' | sort -n | uniq -c | awk 'BEGIN{FS=" "; OFS="\t"}{print $2,$1}' | sort -nk1,1 | sed -e "s/^/$j\t$k\t/g" >> stat/$j.lengthN.stat.tsv
done;
sed -i -e "1i sample\ttype\tlen\tnum" stat/$j.lengthN.stat.tsv
done;
If you want to save time , you can skip the collapse step, which is quite time-consuming. It works fine, and the expression matrix is built on un-collapsed data.
for i in `ls 00.rawdata | egrep -v 'zip|html'`;
do j=${i%.*};
echo $j
for l in 16 18 20 36;
do echo $l;
cutadapt -u -100 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -m $l --trim-n --too-short-output=01.preprocess/$j.L${l}.tooShort.fastq -o 01.preprocess/$j.cutadapt.L${l}.fastq 00.rawdata/$j.fastq >> log/$j.cutadapt.L${l}.log 2>> log/$j.cutadapt.L${l}.err;
fastx_collapser -i 01.preprocess/$j.cutadapt.L${l}.fastq -o 01.preprocess/$j.cutadapt.L${l}.collapsed.fastq
done;
done;
# stat number
for i in `ls 00.rawdata | egrep -v 'zip|html'`;
do j=${i%.*};
echo $j
for l in 16 18 20 36;
do echo $l;
readsN=`grep '@' 01.preprocess/$j.cutadapt.L${l}.fastq | wc -l`;
collapsedReadsN=`grep '>' 01.preprocess/$j.cutadapt.L${l}.collapsed.fastq | wc -l`;
echo -e "$j\t$l\t$readsN\t$collapsedReadsN" >> stat/readsL.collapse.num.sta.txt
done;
done;
# map to miRNA transcriptome
# cheak the mapping ratio of exactly and >1 mapped reads
for i in `ls 00.rawdata | egrep -v 'zip|html'`;
do j=${i%.*};
echo $j;
for l in 16 18 20 36;
do echo $l;
# map to miRNA
mkdir 02.mapping/$j/miRNA
echo "start map to miRNA" >> log/$j.miRNA.log 2>> log/$j.L${l}.miRNA.err
bowtie2 -p 4 --sensitive-local --no-unal --un 02.mapping/$j/miRNA/$j.L${l}.unAligned.fq -x src/miRNA 01.preprocess/$j.cutadapt.L${l}.fastq -S 02.mapping/$j/miRNA/$j.L${l}.miRNA.sam >> log/$j.L${l}.miRNA.log 2>> log/$j.L${l}.miRNA.err
done;
done;
for i in `ls 00.rawdata | egrep -v 'zip|html'`;
do j=${i%.*};
echo $j;
for l in 16 18 20 36;
do echo $l;
exactlyN=`cat log/$j.L${l}.miRNA.err | grep 'exactly' | awk 'BEGIN{FS=OFS=" "}{print $1}'`
multiN=`cat log/$j.L${l}.miRNA.err | grep '>1' | awk 'BEGIN{FS=OFS=" "}{print $1}'`
RNAtypes_N=`echo $(($exactlyN + $multiN))`
echo -e "$j\tsequentialMap\t$l\t$exactlyN\t$multiN\t$RNAtypes_N" >> stat/readsN.readsL.sta.tsv
done;
done;
The result for mapping is listed here.
Thus we can check the distribution of each type of RNAs by taking weighted average. A pie chart is shown below; a boxplot can also show the distribution but not as clear.
avg <- c(0.381592 , 0.041112 , 0.336402 ,0.001645 ,0.000812 ,0.006523 ,0.041891 ,0.017065 )
Category <- c('miRNA', 'piRNA', 'y_RNA', 'snRNA', 'srpRNA', 'lncRNA', 'mRNA', 'Others')
library(ggplot2)
df <- data.frame(avg, Category)
bp <- ggplot(df, aes(x='', y=avg, fill=Category)) +geom_bar(width = 1, stat = 'identity')
pie <- bp + coord_polar("y", start = 0)
pie
Also I did time counting for such mapping.
Take out first 1/10 | mapping to hg38(ms) | miRNA(ms) | piRNA(ms) | YRNA(ms) | snRNA(ms) | srpRNA(ms) | tRNA(ms) | lncRNA(ms) | mRNA(ms) | |
---|---|---|---|---|---|---|---|---|---|---|
BeforeSurgery_1 | 34810 | 38989 | 7712 | 10785 | 8963 | 4444 | 4707 | 4734 | 13949 | 15700 |
BeforeSurgery_2 | 34217 | 56311 | 9817 | 9540 | 8774 | 4979 | 4610 | 4868 | 14926 | 19275 |
BeforeSurgery_3 | 36298 | 38847 | 11839 | 10325 | 9763 | 4510 | 4285 | 4733 | 13714 | 17562 |
AfterSurgery_1 | 42810 | 7863 | 11648 | 12527 | 4733 | 5301 | 5012 | 15272 | 19218 | |
AfterSurgery_2 | 1106(??) | 45145 | 8808 | 9208 | 9702 | 4423 | 4573 | 4558 | 15256 | 18711 |
AfterSurgery_3 | 19927 | 41328 | 7362 | 11191 | 13066 | 4779 | 5605 | 5330 | 14657 | 19787 |
NC_1 | 35511 | 42368 | 8887 | 11798 | 16301 | 4976 | 5157 | 4738 | 13896 | 20799 |
NC_2 | 35033 | 43577 | 7615 | 9876 | 17203 | 4952 | 5031 | 5506 | 14128 | 20036 |
NC_3 | 34953 | 46508 | 7920 | 9908 | 15518 | 5093 | 5152 | 6004 | 13713 | 20917 |
Avg | 40653.667 | 8647 | 10475.444 | 12424.111 | 4765.444 | 4935.667 | 5053.667 | 14390.111 | 19111.667 |
Mapping for such file with total lines of