Skip to content

Benchmark for 10% sample and mapping by Haopu YANG

HYoung1698 edited this page May 10, 2018 · 8 revisions

In order to save time, we can subtract a part of the .fastq file in order to test whether the whole pipeline work. Now I am going to substract the first $1/10$ from 9 given examples and go through the whole preprocessing, cutadapting, mapping and summary.

Subtract a piece from each example

#!/bin/bash
#Current working directory should be ~/projects/exRNA/hcc_examples
mkdir subset
#Get file
for i in AfterSurgery_1 AfterSurgery_2 AfterSurgery_3 BeforeSurgery_1 BeforeSurgery_2 BeforeSurgery_3 NC_1 NC_2 NC_3;
do
 echo "Sub ${i}.fastq"
 totallines=$(wc -l /home/yanghaopu/projects/exRNA/hcc_example/00.rawdata/${i}.fastq | awk '{print $1}')
 sublines=$(expr $totallines  - $totallines % 40)
 sublines=$(expr $sublines / 10)
echo "Before cut LINES = $totallines , after cut LINES = $sublines"
time head -n $sublines /home/yanghaopu/projects/exRNA/hcc_example/00.rawdata/${i}.fastq > /home/yanghaopu/projects/SmallSubset/${i}.fastq
done

Preprocessing and Mapping

I did this by modifying this instruction and my bash code, which is quite bulky and easy, will be uploaded later. Codes are mostly like this

#cutadapt fisrt. 
mkdir ~/projects/01.preprocess
for i in `ls /home/yanghaopu/projects/00.rawdata | egrep -v 'zip|html'`;
do j=${i%.*};
echo $j;
cutadapt -u -100 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -m 16 --trim-n --too-short-output=/home/yanghaopu/projects/01.preprocess/${j}.tooShort.fastq -o ~/projects/01.preprocess/${j}.cutadapt.fastq ~/projects/00.rawdata/${j}.fastq >> ~/projects/log/${j}.cutadapt.log 2>>~/projects/log/${j}.cutadapt.err;
done

#remove rRNA
for i in `ls 00.rawdata | egrep -v 'zip|html'`;

do j=${i%.*};

echo $j;

mkdir 02.mapping/$j 02.mapping/$j/no_rRNA

bowtie2 -p 4 --sensitive-local --no-unal --un 02.mapping/$j/no_rRNA/$j.no_rRNA.fq -x src/rRNA_exon 01.preprocess/$j.cutadapt.fastq -S 02.mapping/$j/no_rRNA/$j.rRNA_exon.sam > log/$j.rm_rRNA.log 2> log/$j.rm_rRNA.err

samtools view -b -f 16 02.mapping/$j/no_rRNA/$j.rRNA_exon.sam > 02.mapping/$j/no_rRNA/$j.rRNA_exon.reverseMap.bam

samtools view -b -F 16 02.mapping/$j/no_rRNA/$j.rRNA_exon.sam > 02.mapping/$j/no_rRNA/$j.rRNA_exon.clean.bam

samtools fastq 02.mapping/$j/no_rRNA/$j.rRNA_exon.reverseMap.bam > 02.mapping/$j/no_rRNA/$j.rRNA_exon.reverseMap.fastq

cat 02.mapping/$j/no_rRNA/$j.no_rRNA.fq 02.mapping/$j/no_rRNA/$j.rRNA_exon.reverseMap.fastq > 02.mapping/$j/no_rRNA/$j.rRNA_exon.unmapped.fastq

done;

#map to human genome

for i in `ls 00.rawdata | egrep -v 'zip|html'`;
do j=${i%.*};

echo $j;

mkdir 02.mapping/$j/hg38

bowtie2 -p 4 --sensitive-local --no-unal --un 02.mapping/$j/hg38/$j.unAligned.fq -x /BioII/lulab_b/shared/genomes/human_hg38/index/bowtie2_hg38_index/GRCh38_p10 02.mapping/$j/no_rRNA/$j.rRNA_exon.unmapped.fastq -S 02.mapping/$j/hg38/$j.hg38.sam > log/$j.map2hg38.log 2> log/$j.map2hg38.err

samtools view -S -b  02.mapping/$j/hg38/$j.hg38.sam > 02.mapping/$j/hg38/$j.hg38.clean.bam 

ln -s 02.mapping/$j/hg38/$j.unAligned.fq 02.mapping/$j/hg38/$j.hg38.unmapped.fastq

done;



## sequential map to each RNA type 

## rRNA_exon miRNA piRNA Y_RNA_exon snRNA srpRNA_exon tRNA lncRNA_exon mRNA_exon

for i in `ls 00.rawdata | egrep -v 'zip|html'`;

do j=${i%.*};

echo $j;

# map to miRNA

mkdir 02.mapping/$j/miRNA

echo "start map to miRNA" >> log/$j.miRNA.log 2>> log/$j.miRNA.err

bowtie2 -p 4 --sensitive-local --no-unal --un 02.mapping/$j/miRNA/$j.unAligned.fq -x src/miRNA 02.mapping/$j/no_rRNA/$j.rRNA_exon.unmapped.fastq -S 02.mapping/$j/miRNA/$j.miRNA.sam > log/$j.miRNA.log 2> log/$j.miRNA.err

samtools view -b -f 16 02.mapping/$j/miRNA/$j.miRNA.sam > 02.mapping/$j/miRNA/$j.miRNA.reverseMap.bam

samtools view -b -F 16 02.mapping/$j/miRNA/$j.miRNA.sam > 02.mapping/$j/miRNA/$j.miRNA.clean.bam

rsem-tbam2gbam src/miRNA 02.mapping/$j/miRNA/$j.miRNA.clean.bam 02.mapping/$j/miRNA/$j.miRNA.rsem.clean.bam > log/$j.miRNA.tbam2gbam.log

samtools fastq 02.mapping/$j/miRNA/$j.miRNA.reverseMap.bam > 02.mapping/$j/miRNA/$j.miRNA.reverseMap.fastq

cat 02.mapping/$j/miRNA/$j.unAligned.fq 02.mapping/$j/miRNA/$j.miRNA.reverseMap.fastq > 02.mapping/$j/miRNA/$j.miRNA.unmapped.fastq



# map to piRNA

mkdir 02.mapping/$j/piRNA

echo "start map to piRNA" >> log/$j.piRNA.log 2>> log/$j.piRNA.err

bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/piRNA/$j.unAligned.fq -x src/piRNA 02.mapping/$j/miRNA/$j.miRNA.unmapped.fastq -S 02.mapping/$j/piRNA/$j.piRNA.sam > log/$j.piRNA.log 2> log/$j.piRNA.err

samtools view -b -f 16 02.mapping/$j/piRNA/$j.piRNA.sam > 02.mapping/$j/piRNA/$j.piRNA.reverseMap.bam

samtools view -b -F 16 02.mapping/$j/piRNA/$j.piRNA.sam > 02.mapping/$j/piRNA/$j.piRNA.clean.bam

rsem-tbam2gbam src/piRNA 02.mapping/$j/piRNA/$j.piRNA.clean.bam 02.mapping/$j/piRNA/$j.piRNA.rsem.clean.bam > log/$j.piRNA.tbam2gbam.log

samtools fastq 02.mapping/$j/piRNA/$j.piRNA.reverseMap.bam > 02.mapping/$j/piRNA/$j.piRNA.reverseMap.fastq

cat 02.mapping/$j/piRNA/$j.unAligned.fq 02.mapping/$j/piRNA/$j.piRNA.reverseMap.fastq > 02.mapping/$j/piRNA/$j.piRNA.unmapped.fastq



# map to Y_RNA

mkdir 02.mapping/$j/Y_RNA_exon

echo "start map to Y_RNA_exon" >> log/$j.Y_RNA_exon.log 2>> log/$j.Y_RNA_exon.err

bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/Y_RNA_exon/$j.unAligned.fq -x src/Y_RNA_exon 02.mapping/$j/piRNA/$j.piRNA.unmapped.fastq -S 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.sam > log/$j.Y_RNA_exon.log 2> log/$j.Y_RNA_exon.err

samtools view -b -f 16 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.sam > 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.reverseMap.bam

samtools view -b -F 16 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.sam > 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.clean.bam

rsem-tbam2gbam src/Y_RNA_exon 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.clean.bam 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.rsem.clean.bam > log/$j.Y_RNA_exon.tbam2gbam.log

samtools fastq 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.reverseMap.bam > 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.reverseMap.fastq

cat 02.mapping/$j/Y_RNA_exon/$j.unAligned.fq 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.reverseMap.fastq > 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.unmapped.fastq



# map to snRNA

mkdir 02.mapping/$j/snRNA

echo "start map to snRNA" >> log/$j.snRNA.log 2>> log/$j.snRNA.err

bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/snRNA/$j.unAligned.fq -x src/snRNA 02.mapping/$j/Y_RNA_exon/$j.Y_RNA_exon.unmapped.fastq -S 02.mapping/$j/snRNA/$j.snRNA.sam > log/$j.snRNA.log 2> log/$j.snRNA.err

samtools view -b -f 16 02.mapping/$j/snRNA/$j.snRNA.sam > 02.mapping/$j/snRNA/$j.snRNA.reverseMap.bam

samtools view -b -F 16 02.mapping/$j/snRNA/$j.snRNA.sam > 02.mapping/$j/snRNA/$j.snRNA.clean.bam

rsem-tbam2gbam src/snRNA 02.mapping/$j/snRNA/$j.snRNA.clean.bam 02.mapping/$j/snRNA/$j.snRNA.rsem.clean.bam > log/$j.snRNA.tbam2gbam.log

samtools fastq 02.mapping/$j/snRNA/$j.snRNA.reverseMap.bam > 02.mapping/$j/snRNA/$j.snRNA.reverseMap.fastq

cat 02.mapping/$j/snRNA/$j.unAligned.fq 02.mapping/$j/snRNA/$j.snRNA.reverseMap.fastq > 02.mapping/$j/snRNA/$j.snRNA.unmapped.fastq



# map to srpRNA

mkdir 02.mapping/$j/srpRNA_exon

echo "start map to srpRNA_exon" >> log/$j.srpRNA_exon.log 2>> log/$j.srpRNA_exon.err

bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/srpRNA_exon/$j.unAligned.fq -x src/srpRNA_exon 02.mapping/$j/snRNA/$j.snRNA.unmapped.fastq -S 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.sam > log/$j.srpRNA_exon.log 2> log/$j.srpRNA_exon.err

samtools view -b -f 16 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.sam > 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.reverseMap.bam

samtools view -b -F 16 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.sam > 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.clean.bam

rsem-tbam2gbam src/srpRNA_exon 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.clean.bam 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.rsem.clean.bam > log/$j.srpRNA_exon.tbam2gbam.log

samtools fastq 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.reverseMap.bam > 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.reverseMap.fastq

cat 02.mapping/$j/srpRNA_exon/$j.unAligned.fq 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.reverseMap.fastq > 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.unmapped.fastq



# map to tRNA

mkdir 02.mapping/$j/tRNA

echo "start map to tRNA" >> log/$j.tRNA.log 2>> log/$j.tRNA.err

bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/tRNA/$j.unAligned.fq -x src/tRNA 02.mapping/$j/srpRNA_exon/$j.srpRNA_exon.unmapped.fastq -S 02.mapping/$j/tRNA/$j.tRNA.sam > log/$j.tRNA.log 2> log/$j.tRNA.err

samtools view -b -f 16 02.mapping/$j/tRNA/$j.tRNA.sam > 02.mapping/$j/tRNA/$j.tRNA.reverseMap.bam

samtools view -b -F 16 02.mapping/$j/tRNA/$j.tRNA.sam > 02.mapping/$j/tRNA/$j.tRNA.clean.bam

rsem-tbam2gbam src/tRNA 02.mapping/$j/tRNA/$j.tRNA.clean.bam 02.mapping/$j/tRNA/$j.tRNA.rsem.clean.bam > log/$j.tRNA.tbam2gbam.log

samtools fastq 02.mapping/$j/tRNA/$j.tRNA.reverseMap.bam > 02.mapping/$j/tRNA/$j.tRNA.reverseMap.fastq

cat 02.mapping/$j/tRNA/$j.unAligned.fq 02.mapping/$j/tRNA/$j.tRNA.reverseMap.fastq > 02.mapping/$j/tRNA/$j.tRNA.unmapped.fastq



# map to lncRNA

mkdir 02.mapping/$j/lncRNA_exon

echo "start map to lncRNA_exon" > log/$j.lncRNA_exon.log 2> log/$j.lncRNA_exon.err

bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/lncRNA_exon/$j.unAligned.fq -x src/lncRNA_exon 02.mapping/$j/tRNA/$j.tRNA.unmapped.fastq -S 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.sam > log/$j.lncRNA_exon.log 2> log/$j.lncRNA_exon.err

samtools view -b -f 16 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.sam > 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.reverseMap.bam

samtools view -b -F 16 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.sam > 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.clean.bam

rsem-tbam2gbam src/lncRNA_exon 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.clean.bam 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.rsem.clean.bam > log/$j.lncRNA_exon.tbam2gbam.log

samtools fastq 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.reverseMap.bam > 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.reverseMap.fastq

cat 02.mapping/$j/lncRNA_exon/$j.unAligned.fq 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.reverseMap.fastq > 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.unmapped.fastq



# map to mRNA

mkdir 02.mapping/$j/mRNA_exon

echo "start map to mRNA_exon" >> log/$j.mRNA_exon.log 2>> log/$j.mRNA_exon.err

bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/mRNA_exon/$j.unAligned.fq -x src/mRNA_exon 02.mapping/$j/lncRNA_exon/$j.lncRNA_exon.unmapped.fastq -S 02.mapping/$j/mRNA_exon/$j.mRNA_exon.sam > log/$j.mRNA_exon.log 2> log/$j.mRNA_exon.err

samtools view -b -f 16 02.mapping/$j/mRNA_exon/$j.mRNA_exon.sam > 02.mapping/$j/mRNA_exon/$j.mRNA_exon.reverseMap.bam

samtools view -b -F 16 02.mapping/$j/mRNA_exon/$j.mRNA_exon.sam > 02.mapping/$j/mRNA_exon/$j.mRNA_exon.clean.bam

rsem-tbam2gbam src/mRNA_exon 02.mapping/$j/mRNA_exon/$j.mRNA_exon.clean.bam 02.mapping/$j/mRNA_exon/$j.mRNA_exon.rsem.clean.bam > log/$j.mRNA_exon.tbam2gbam.log

samtools fastq 02.mapping/$j/mRNA_exon/$j.mRNA_exon.reverseMap.bam > 02.mapping/$j/mRNA_exon/$j.mRNA_exon.reverseMap.fastq

cat 02.mapping/$j/mRNA_exon/$j.unAligned.fq 02.mapping/$j/mRNA_exon/$j.mRNA_exon.reverseMap.fastq > 02.mapping/$j/mRNA_exon/$j.mRNA_exon.unmapped.fastq



# map to genome hg38other region

mkdir 02.mapping/$j/hg38other

echo "start map to hg38other" >> log/$j.hg38other.log 2>> log/$j.hg38other.err

bowtie2 -p 4 --sensitive-local --un 02.mapping/$j/hg38other/$j.unAligned.fq -x /BioII/lulab_b/shared/genomes/human_hg38/index/bowtie2_hg38_index/GRCh38_p10 02.mapping/$j/mRNA_exon/$j.mRNA_exon.unmapped.fastq -S 02.mapping/$j/hg38other/$j.hg38other.sam > log/$j.hg38other.log 2> log/$j.hg38other.err

samtools view -S -b 02.mapping/$j/hg38other/$j.hg38other.sam > 02.mapping/$j/hg38other/$j.hg38other.clean.bam

ln -s 02.mapping/$j/hg38other/$j.unAligned.fq 02.mapping/$j/hg38other/$j.hg38other.unmapped.fastq

done;

## summary the results and statistics

mkdir stat

## number of mapped readsfor each RNA types

for i in `ls 00.rawdata | egrep -v 'zip|html'`;

do j=${i%.*};

echo $j;

# total library size

libSizeN=`cat log/$j.cutadapt.log | grep 'Total reads processed' | awk 'BEGIN{FS=OFS=":"}{print $2}' | sed 's/^ *//g' | sed 's/,//g'`

echo -e "$j\tpreprocess\tlibSizeN\t$libSizeN" >> stat/$j.readsN.stat.tsv

# too short

tooShortN=`cat log/$j.cutadapt.log | grep 'Reads that were too short' | awk 'BEGIN{FS=OFS=":"}{print $2}' | sed 's/^ *//g' | sed 's/,//g' | cut -d ' ' -f 1`

echo -e "$j\tpreprocess\ttooShortN\t$tooShortN" >> stat/$j.readsN.stat.tsv

# clean reads

cleanN=`cat log/$j.cutadapt.log | grep 'Reads written' | awk 'BEGIN{FS=OFS=":"}{print $2}' | sed 's/^ *//g' | sed 's/,//g' | cut -d ' ' -f 1`

echo -e "$j\tpreprocess\tcleanN\t$cleanN" >> stat/$j.readsN.stat.tsv

# rRNA mapped reads

rRNA_N=`samtools flagstat 02.mapping/$j/no_rRNA/$j.rRNA_exon.clean.bam | awk 'NR==5{print $1}'`

echo -e "$j\tpreprocess\trRNA_N\t$rRNA_N" >> stat/$j.readsN.stat.tsv

# kept reads

keptN=`grep '@' 02.mapping/$j/no_rRNA/$j.rRNA_exon.unmapped.fastq | wc -l`

echo -e "$j\tpreprocess\tkeptN\t$keptN" >> stat/$j.readsN.stat.tsv

# map to hg38

hg38_N=`samtools flagstat 02.mapping/$j/hg38/$j.hg38.clean.bam | awk 'NR==5{print $1}'`

echo -e "$j\tmap2hg38\thg38\t$hg38_N" >> stat/$j.readsN.stat.tsv

# map to different RNA types

for k in miRNA piRNA Y_RNA_exon snRNA srpRNA_exon tRNA lncRNA_exon mRNA_exon;

do echo $k;

RNAtypes_N=`samtools flagstat 02.mapping/$j/$k/$j.$k.rsem.clean.bam | awk 'NR==5{print $1}'`

echo -e "$j\tsequentialMap\t$k\t$RNAtypes_N" >> stat/$j.readsN.stat.tsv

done;

# map to hg38 other region

hg38other_N=`samtools flagstat 02.mapping/$j/hg38other/$j.hg38other.clean.bam | awk 'NR==5{print $1}'`

echo -e "$j\tmap2hg38other\thg38other\t$hg38other_N" >> stat/$j.readsN.stat.tsv

# non-human

nonHuman_N=`grep '@' 02.mapping/$j/hg38other/$j.hg38other.unmapped.fastq | wc -l`

echo -e "$j\tmap2hg38other\tnonHuman_N\t$nonHuman_N" >> stat/$j.readsN.stat.tsv

done;





## length of mapped reads for each RNA types

for i in `ls 00.rawdata | egrep -v 'zip|html'`;

do j=${i%.*};

echo $j;

for k in miRNA piRNA Y_RNA_exon snRNA srpRNA_exon tRNA lncRNA_exon mRNA_exon;

do echo $k;

samtools view 02.mapping/$j/$k/$j.$k.rsem.clean.bam | awk 'BEGIN{FS=OFS="\t"}{print length($10)}' | sort -n | uniq -c | awk 'BEGIN{FS=" "; OFS="\t"}{print $2,$1}' | sort -nk1,1 | sed -e "s/^/$j\t$k\t/g" >> stat/$j.lengthN.stat.tsv

done;

sed -i -e "1i sample\ttype\tlen\tnum" stat/$j.lengthN.stat.tsv

done;

Collapse (Optional)

If you want to save time , you can skip the collapse step, which is quite time-consuming. It works fine, and the expression matrix is built on un-collapsed data.

for i in `ls 00.rawdata | egrep -v 'zip|html'`;

do j=${i%.*};

echo $j

for l in 16 18 20 36;

do echo $l;

cutadapt -u -100 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -m $l --trim-n --too-short-output=01.preprocess/$j.L${l}.tooShort.fastq -o 01.preprocess/$j.cutadapt.L${l}.fastq 00.rawdata/$j.fastq >> log/$j.cutadapt.L${l}.log 2>> log/$j.cutadapt.L${l}.err;

fastx_collapser -i 01.preprocess/$j.cutadapt.L${l}.fastq -o 01.preprocess/$j.cutadapt.L${l}.collapsed.fastq

done;

done;



# stat number

for i in `ls 00.rawdata | egrep -v 'zip|html'`;

do j=${i%.*};

echo $j

for l in 16 18 20 36;

do echo $l;

readsN=`grep '@' 01.preprocess/$j.cutadapt.L${l}.fastq | wc -l`;

collapsedReadsN=`grep '>' 01.preprocess/$j.cutadapt.L${l}.collapsed.fastq | wc -l`;

echo -e "$j\t$l\t$readsN\t$collapsedReadsN" >> stat/readsL.collapse.num.sta.txt

done;

done;



# map to miRNA transcriptome

# cheak the mapping ratio of exactly and >1 mapped reads

for i in `ls 00.rawdata | egrep -v 'zip|html'`;

do j=${i%.*};

echo $j;

for l in 16 18 20 36;

do echo $l;

# map to miRNA

mkdir 02.mapping/$j/miRNA

echo "start map to miRNA" >> log/$j.miRNA.log 2>> log/$j.L${l}.miRNA.err

bowtie2 -p 4 --sensitive-local --no-unal --un 02.mapping/$j/miRNA/$j.L${l}.unAligned.fq -x src/miRNA 01.preprocess/$j.cutadapt.L${l}.fastq -S 02.mapping/$j/miRNA/$j.L${l}.miRNA.sam >> log/$j.L${l}.miRNA.log 2>> log/$j.L${l}.miRNA.err

done;

done;



for i in `ls 00.rawdata | egrep -v 'zip|html'`;

do j=${i%.*};

echo $j;

for l in 16 18 20 36;

do echo $l;

exactlyN=`cat log/$j.L${l}.miRNA.err | grep 'exactly' | awk 'BEGIN{FS=OFS=" "}{print $1}'`

multiN=`cat log/$j.L${l}.miRNA.err | grep '>1' | awk 'BEGIN{FS=OFS=" "}{print $1}'`

RNAtypes_N=`echo $(($exactlyN + $multiN))`

echo -e "$j\tsequentialMap\t$l\t$exactlyN\t$multiN\t$RNAtypes_N" >> stat/readsN.readsL.sta.tsv

done;

done;

Build Expression Matrices

Result

The result for mapping is listed here.

Thus we can check the distribution of each type of RNAs by taking weighted average. A pie chart is shown below; a boxplot can also show the distribution but not as clear.

avg <- c(0.381592 , 0.041112 , 0.336402 ,0.001645 ,0.000812 ,0.006523 ,0.041891 ,0.017065 )
Category <- c('miRNA', 'piRNA', 'y_RNA', 'snRNA', 'srpRNA', 'lncRNA', 'mRNA', 'Others')
library(ggplot2)
df <- data.frame(avg, Category)
bp <- ggplot(df, aes(x='', y=avg, fill=Category)) +geom_bar(width = 1, stat = 'identity')
pie <- bp + coord_polar("y", start = 0)
pie

Time cost

Also I did time counting for such mapping.

Take out first 1/10 mapping to hg38(ms) miRNA(ms) piRNA(ms) YRNA(ms) snRNA(ms) srpRNA(ms) tRNA(ms) lncRNA(ms) mRNA(ms)
BeforeSurgery_1 34810 38989 7712 10785 8963 4444 4707 4734 13949 15700
BeforeSurgery_2 34217 56311 9817 9540 8774 4979 4610 4868 14926 19275
BeforeSurgery_3 36298 38847 11839 10325 9763 4510 4285 4733 13714 17562
AfterSurgery_1 42810 7863 11648 12527 4733 5301 5012 15272 19218
AfterSurgery_2 1106(??) 45145 8808 9208 9702 4423 4573 4558 15256 18711
AfterSurgery_3 19927 41328 7362 11191 13066 4779 5605 5330 14657 19787
NC_1 35511 42368 8887 11798 16301 4976 5157 4738 13896 20799
NC_2 35033 43577 7615 9876 17203 4952 5031 5506 14128 20036
NC_3 34953 46508 7920 9908 15518 5093 5152 6004 13713 20917
Avg 40653.667 8647 10475.444 12424.111 4765.444 4935.667 5053.667 14390.111 19111.667

Mapping for such file with total lines of $10^6$ would take $10^2$s for each, which is pretty quick. But the wc was pretty slow, which takes $10^2$s for line counting each time and $10^2-10^3$s for subtracting.