- Start by launching a VM wither as a droplet in Digital Ocean or locally via boot2docker and and ssh into it.
- Download the docker bwawrik/bioinformatics:latest
docker pull bwawrik/bioinformatics:latest
- Make a data directory
mkdir /data
- Start the docker and mount /data
docker run -t -i -v /data:/data bwawrik/bioinformatics:latest
- Change your directory to /data
cd /data
- Download the sample data and unzip it. This file represents some of the contigs that were generated from a metagenome dataset. The complete assembly fasta file is much larger. Only a subsample of contigs is included here to illustate the procedure.
wget https://github.com/bwawrik/MBIO5810/raw/master/sequence_data/pipeline_mg_contigs.fas.gz
gunzip *.gz
- Now make an output directory
mkdir /data/output
- Predict ORFs as nucleotide (fna) and amio acid (faa) sequences
prodigal -d output/temp.orfs.fna -a output/temp.orfs.faa -i pipeline_mg_contigs.fas -m -o output/tempt.txt -p meta -q
cut -f1 -d " " output/temp.orfs.fna > output/prodigal.orfs.fna
cut -f1 -d " " output/temp.orfs.faa > output/prodigal.orfs.faa
rm -f output/temp*
You can do this separately by just call the ' -d output/temp.orfs.fna' or '-a output/temp.orfs.faa' flags. The last command removes the temporary files.
- First you need to copy the model files to the local directory. (This is a workaround; I'm not sure why it doesn't work without copying these files; sorry !)
mkdir Ftrain
cp /opt/local/software/FragGeneScan1.19/train/* Ftrain
- Now lets predict the ORFs
FragGene_Scan -s VigP03RayK31Contigs.fasta -o output/VigP03RayK31.FragGeneScan -w 1 -t complete
- Clean up
rm -rf Ftrain
- Run the N50.pl script on both results (see assembly tutorial).
Which one produces longers ORFs ? Which produces more ORFs ? Which is better ? Why ? What would be a better way to assess the quality of ORF calling ?
wget https://github.com/bwawrik/MBIO5810/raw/master/perl_scripts/N50.pl
perl N50.pl output/VigP03RayK31.FragGeneScan.ffn
perl N50.pl output/VigP03RayK31.prodigal.orfs.fna
- If you are using boot2docker or a local machine, there is no need for this step.
- Log out of your VM or droplet.
- Then use secure copy (scp) to retrieve your files to your local drive. In this example, I used a droplet with the IP 45.55.160.193 and retrieved the files to my desktop on my macbook. Make sure you replace this with the IP for your droplet.
scp [email protected]:/data/output/* ~/Desktop/
- If you are using a PC, use an FTP program to retrieve your files.