This repository contains the miRNA Analysis Pipeline, developed to analyze publicly available NGS datasets (miRNA-seq) for various cancer types. The workflow is implemented using automated scripts and R packages for preprocessing, differential expression analysis, meta-analysis, and functional analysis.
-
Preprocessing and Alignment
- The script
mirna.sh
performs the following tasks:- Quality check of raw reads using FastQC.
- Trimming of adapters and low-quality bases using fastp or similar tools.
- Alignment of reads to the genome and read counts using miRDeep2.
- The script
-
Differential Expression Analysis
- Differential expression analysis is conducted on case vs. control datasets from six bioprojects.
- The results include:
- Log Fold Change (LFC) values.
- Standard error estimates for each miRNA.
-
Meta-Analysis
- Meta-analysis of miRNAs is conducted using the metafor package in R.
- Filtering criteria:
- miRNAs must be detected in multiple studies.
- Log Fold Change (LFC) and standard error ratio (SE) are used to refine results.
- Outputs include forest plots for selected miRNAs.
-
Functional Analysis
- Target genes of significant miRNAs are identified using tools such as miRDB.
- Gene ontology (GO) term enrichment is performed using clusterProfiler.
- GO terms are visualized with ggplot2.
- Shell Scripting: Automated preprocessing and alignment steps.
- R Packages:
DESeq2
for differential expression analysis.metafor
for meta-analysis.clusterProfiler
for functional enrichment analysis.ggplot2
for data visualization.
- Preprocessing: Quality check, trimming, and alignment.
- Differential Expression Analysis: Case vs. control comparisons for individual datasets.
- Meta-Analysis: Integration of results across multiple studies.
- Functional Analysis: GO term enrichment and visualization.