Skip to content
gmiclotte edited this page May 14, 2016 · 20 revisions

##OMSim: simulating optical map read data

OMSim is a simulation tool for optical map reads of the Irys platform (BioNano Genomics).

Important

OMSim is currently in development. Future versions are not guaranteed to be backwards compatible with configuration scripts of the current version.

This wiki refers to the latest version on the master branch. Any other versions may not be compatible with this wiki.

The command line examples in this wiki are based on bash, if you use another command line, other syntax (e.g. for accessing subdirectories) might be required.

Input

OMSim takes as input a genome file in fasta format and an XML file specifying the knicking enzymes. Both of these and all other (optional) parameters can be specified in an XML file. Example XML files are provided (example.xml and minimal.xml).

The output is a BNX file (per chip) containing the reads. Additionally a BED file is produced, specifying the origin of the simulated reads.

Dependencies

At the moment OMSim requires Python 2.7 and numpy. It has been tested on Linux.

Test run

A circular test data set has been provided in the test folder. Go to test/ecoli and run:

python ../../src/omsim.py ecoli.xml  

This will produces the following files:
ecoli_output.label_0.1.bnx containing reads with recognition site GCTCTTC
ecoli_output.label_1.1.bnx containing reads with recognition site CACGAG
ecoli_output.bed containing the start and end positions on the reference of all generated reads

Additionally the following terminal output should be generated:
../../src/omsim.py example.xml
Version: 0.1
BNX version: 1.2
Circular genome.
Minimal molecule length: 20000 bp
Average molecule length: 200000.0 bp
Minimal coverage: 1x
Chimera rate: 1.0%
Random seed: 0

Indexing sequence: gi|49175990|ref|NC_000913.2|
Found 1490 knicks in 4639675bp.
Generating reads on 1 chip, estimated coverage: 10776x.
Finished processing E. coli.

Quick start

To jump right in with your own data, edit minimal.xml and replace "genome.fasta" with the location of your input fasta file. Running OMSim (with the predefined BspQI enzyme) then simply becomes:

python src/omsim.py minimal.xml

The output will be in files omsim_output.label_0.xxx.bnx .

Finetuning

Many parameters can be fine tuned. True positive and false negative rates are enzyme dependent and have to be specified in the enzymes file (e.g. enzymes.xml). All other settings can be specified in the main XML file (see example.xml).

Clone this wiki locally