-
Notifications
You must be signed in to change notification settings - Fork 3
Home
##OMSim: simulating optical map read data
OMSim is a simulation tool for optical map reads of the Irys platform (BioNano Genomics).
OMSim is currently in development. Future versions are not guaranteed to be backwards compatible with configuration scripts of the current version.
This wiki refers to the latest version on the master branch. Any other versions may not be compatible with this wiki.
The command line examples in this wiki are based on bash, if you use another command line, other syntax (e.g. for accessing subdirectories) might be required.
OMSim takes as input a genome file in fasta format and an XML file specifying the knicking enzymes. Both of these and all other (optional) parameters can be specified in an XML file. Example XML files are provided (example.xml and minimal.xml).
The output is a BNX file (per chip) containing the reads. Additionally a BED file is produced, specifying the origin of the simulated reads.
At the moment OMSim requires Python 2.7 and scipy. It has been tested on Linux. Python 3.5 should also work, no other distributions have been tested.
A circular test data set has been provided in the test folder. Navigate to test/ecoli and run:
python ../../src/omsim/__main__.py example.xml
This will produces the following files:
ecoli_output.label_0.1.bnx containing reads with recognition site GCTCTTC
ecoli_output.label_1.1.bnx containing reads with recognition site CACGAG
ecoli_output.bed containing the start and end positions on the reference of all generated reads
Additionally the following terminal output should be generated:
../../src/omsim/__main__.py example.xml
Version: v0.2
BNX version: 1.2
Circular genome.
Minimal molecule length: 20000 bp
Average molecule length: 200000.0 bp
Minimal coverage: 1x
Chimera rate: 1.0%
Random seed: 0
Indexing sequence: gi|49175990|ref|NC_000913.2|
Found 1490 nicks in 4639675bp.
Generating reads on 1 chip, estimated coverage: 9698x.
Finished processing E. coli.
To jump right in with your own data, edit minimal.xml and replace "genome.fasta" with the location of your (non-circular) input data in fasta format. Running OMSim (with the predefined BspQI enzyme) then simply becomes:
python src/omsim/omsim.py minimal.xml
The output will be in files omsim_output.label_0.xxx.bnx .
Many parameters can be fine tuned. True positive and false negative rates are enzyme dependent and have to be specified in the enzymes file (e.g. enzymes.xml). All other settings can be specified in the main XML file (see example.xml).