Ostreococcus tauri is a small eukaryotic organism with a relatively small genome, making it suitable for quick construction of genome and transcriptome indexes. In this README, we will outline the steps to build these indexes.
- Files: Ostreococcus tauri genome file (.fna) and general transfer format file (.gtf)
- Software: Hisat2
- Run the following command to download the genomic fasta file.
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/214/015/GCF_000214015.3_version_140606/GCF_000214015.3_version_140606_genomic.fna.gz
- Run the following command to unzip the genomic fasta file.
gunzip GCF_000214015.3_version_140606_genomic.fna.gz
- Run the following command to create a genome index folder.
mkdir GenomeIndex
- Run the following command to build the genome index (4 threads used). Take a look in the GenomeIndex folder after executing this command.
hisat2-build GCF_000214015.3_version_140606_genomic.fna GenomeIndex/genome -p 4
- Run the following command to download the general transfer file.
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/214/015/GCF_000214015.3_version_140606/GCF_000214015.3_version_140606_genomic.gtf.gz
- Run the following command to unzip the general transfer file.
gunzip GCF_000214015.3_version_140606_genomic.gtf.gz
- Run the following command to create a transcriptome index folder.
mkdir TranscriptomeIndex
- Run the following command to extract the splice sites from the Ostreococcus tauri genome (information about splice sites can be found in the .gtf file). Take a look in the TranscriptomeIndex folder after executing this command.
hisat2_extract_splice_sites.py GCF_000214015.3_version_140606_genomic.gtf > TranscriptomeIndex/SpliceSites.ss
- Run the following command to extract the exons from the Ostreococcus tauri genome (information about exons can be found in the .gtf file). Take a look in the TranscriptomeIndex folder after executing this command.
hisat2_extract_exons.py GCF_000214015.3_version_140606_genomic.gtf > TranscriptomeIndex/Exons.exon
- Run the following command to build the transcriptome index (4 threads used). Take a look in the TranscriptomeIndex folder after executing this command.
hisat2-build --ss TranscriptomeIndex/SpliceSites.ss --exon TranscriptomeIndex/Exons.exon GCF_000214015.3_version_140606_genomic.fna TranscriptomeIndex/transcriptome -p 4
- Run the following command to download a few spots from a sample record that is linked to Ostreococcus tauri.
fastq-dump --gzip --split-3 -X 1000 SRR7121135
- Run the following command to create a results folder
mkdir Results
- Align the RNA-seq reads to the genome index:
hisat2 -x GenomeIndex/genome -U SRR7121135_1.fastq.gz -S Results/SRR7121135_1_genome.sam
- Align the RNA-seq reads to the transcriptome index:
hisat2 -x TranscriptomeIndex/transcriptome -U SRR7121135_1.fastq.gz -S Results/SRR7121135_1_transcriptome.sam