RNA-seq Simulation Project

This project aims to simulate RNA-seq data with specific differential expression patterns using the polyester R package. The simulation process involves extracting gene sets from a JSON file, mapping these genes to their indices in a reference genome FASTA file, and then using these mappings to simulate RNA-seq data with defined fold changes.

Project Structure

. ├── data
│   ├── c2.cp.kegg_medicus.v2023.2.Hs.json
│   └── genome
│   └── gencode.v38.transcripts.fa
├── doc
│   └── environment.yml
├── indexed_gset.ipynb
├── License
├── output
│   └── gene_sets_with_indices.tsv
├── README.md
├── scripts
│   ├── rnaseq_simulation.R
│   ├── submit_simulation_EXAMPLE.sh
│   └── submit_simulation.sh
└── simulate_RNAseq.Rmd

Requirements

Python 3.x
Conda
R with the following packages:
- dplyr
- polyester
- Biostrings

Requirements

Python 3.x
Conda

Steps and Usage

Set Up Conda Environment

Create the Conda environment using the environment.yml file:

conda env create -f environment.yml

In R session:

BiocManager::install("polyester")

Steps and Usage

Step 1: Extract Gene Sets with Indices (Python)

Use the Jupyter notebook indexed_gset.ipynb to:

Define file paths and target gene sets.
Extract gene names from the JSON file.
Map gene names to their first occurrence index in the FASTA file.
Output a TSV file with the gene set, target gene, and index.

Step 2: Simulate RNA-seq Data (R)

Use rnaseq_simulation.Rmd or the R script rnaseq_simulation.R in the scripts directory to:

Read the TSV file with gene sets and indices.
Define fold change values for each gene set.
Create a table with fold change values for each gene.
Load the FASTA sequences.
Create a fold changes matrix.
Simulate the RNA-seq experiment with specified replicates and fold changes.

[OPTIONAL]: Run on Server

Use the submit_simulation.sh script in the scripts directory to submit the job to the server:

From scripts directory: submit the job using the command: sbatch submit_simulation.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA-seq Simulation Project

Project Structure

Requirements

Requirements

Steps and Usage

Set Up Conda Environment

Steps and Usage

Step 1: Extract Gene Sets with Indices (Python)

Step 2: Simulate RNA-seq Data (R)

[OPTIONAL]: Run on Server

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
doc		doc
output		output
scripts		scripts
.gitignore		.gitignore
License		License
README.md		README.md
dge_analysis.Rmd		dge_analysis.Rmd
indexed_gset.ipynb		indexed_gset.ipynb
simulate_RNAseq.Rmd		simulate_RNAseq.Rmd

License

fadichoucha/simulateRNA

Folders and files

Latest commit

History

Repository files navigation

RNA-seq Simulation Project

Project Structure

Requirements

Requirements

Steps and Usage

Set Up Conda Environment

Steps and Usage

Step 1: Extract Gene Sets with Indices (Python)

Step 2: Simulate RNA-seq Data (R)

[OPTIONAL]: Run on Server

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages