Predicting ORFs from contigs generated by metagenome assembly

Start by launching a VM wither as a droplet in Digital Ocean or locally via boot2docker and and ssh into it.
Download the docker bwawrik/bioinformatics:latest

docker pull bwawrik/bioinformatics:latest

Make a data directory

mkdir /data

Start the docker and mount /data

docker run -t -i -v /data:/data bwawrik/bioinformatics:latest

Change your directory to /data

cd /data

Download the sample data and unzip it. This file represents some of the contigs that were generated from a metagenome dataset. The complete assembly fasta file is much larger. Only a subsample of contigs is included here to illustate the procedure.

wget https://github.com/bwawrik/MBIO5810/raw/master/sequence_data/pipeline_mg_contigs.fas.gz
gunzip *.gz

Now make an output directory

mkdir /data/output

Prodigal

Predict ORFs as nucleotide (fna) and amio acid (faa) sequences

prodigal -d output/temp.orfs.fna -a output/temp.orfs.faa -i pipeline_mg_contigs.fas -m -o output/tempt.txt -p meta -q
cut -f1 -d " " output/temp.orfs.fna > output/prodigal.orfs.fna
cut -f1 -d " " output/temp.orfs.faa > output/prodigal.orfs.faa
rm -f output/temp*

You can do this separately by just call the ' -d output/temp.orfs.fna' or '-a output/temp.orfs.faa' flags. The last command removes the temporary files.

FragGeneScan

First you need to copy the model files to the local directory. (This is a workaround; I'm not sure why it doesn't work without copying these files; sorry !)

mkdir Ftrain
cp /opt/local/software/FragGeneScan1.19/train/* Ftrain

Now lets predict the ORFs

FragGene_Scan -s VigP03RayK31Contigs.fasta -o output/VigP03RayK31.FragGeneScan -w 1 -t complete

Clean up

rm -rf Ftrain

Run the N50.pl script on both results (see assembly tutorial).

Which one produces longers ORFs ? Which produces more ORFs ? Which is better ? Why ? What would be a better way to assess the quality of ORF calling ?

wget https://github.com/bwawrik/MBIO5810/raw/master/perl_scripts/N50.pl
perl N50.pl output/VigP03RayK31.FragGeneScan.ffn
perl N50.pl output/VigP03RayK31.prodigal.orfs.fna

Retrieving your output

If you are using boot2docker or a local machine, there is no need for this step.
Log out of your VM or droplet.
Then use secure copy (scp) to retrieve your files to your local drive. In this example, I used a droplet with the IP 45.55.160.193 and retrieved the files to my desktop on my macbook. Make sure you replace this with the IP for your droplet.

scp root@45.55.160.193:/data/output/* ~/Desktop/

If you are using a PC, use an FTP program to retrieve your files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

04_GENE_PREDICTION.md

04_GENE_PREDICTION.md

Predicting ORFs from contigs generated by metagenome assembly

Prodigal

FragGeneScan

Retrieving your output

Files

04_GENE_PREDICTION.md

Latest commit

History

04_GENE_PREDICTION.md

File metadata and controls

Predicting ORFs from contigs generated by metagenome assembly

Prodigal

FragGeneScan

Retrieving your output