Skip to content
Niema Moshiri edited this page Feb 24, 2018 · 96 revisions

FAVITES (FrAmework for VIral Transmission and Evolution Simulation) is a robust modular framework for the simultaneous simulation of a transmission network and viral evolution, as well as simulation of sampling imperfections of the transmission network and of the sequencing process. The framework is robust in that the simulation process has been broken down into a series of interactions between abstract module classes, and the user can simply plug in each desired module implementation (or implement one from scratch) to customize any stage of the simulation process.

Contents

Installing

Assuming you have already set up all of the required dependencies, simply clone the FAVITES repository to the desired location on your machine.

git clone https://github.com/niemasd/FAVITES.git

For your convenience, we have also packaged FAVITES and all of its dependencies into a Docker image, which will automatically rebuild as updates are pushed to the FAVITES repository. Further, we created a standalone script that runs the Docker image without any user configuration needed aside from actually setting up Docker itself. Simply download it anywhere (ideally in your PATH):

curl https://raw.githubusercontent.com/niemasd/FAVITES/master/run_favites_docker.py > /usr/local/bin/run_favites_docker.py
chmod a+x /usr/local/bin/run_favites_docker.py

Requirements

To run FAVITES, you must use Python 3. Because of the many dependencies, we recommend that you use our Docker image (via run_favites_docker.py), but if you are unable to use Docker, you will need to install each selected module's own dependencies before usage (see corresponding module pages for more information). If you are missing some dependencies, FAVITES can still run if you do not choose modules with missing dependencies. To guarantee that your setup works with all module implementations, here is a comprehensive list of dependencies across all current module implementations:

Usage

If you cloned the repository and set up the dependencies yourself, you can use run_favites.py:

run_favites.py [-h] -c CONFIG [-v]
  -h, --help                   show help message and exit
  -c CONFIG, --config CONFIG   Configuration file
  -v, --verbose                Print verbose messages to stderr (default: False)

For the Docker image, if you have already downloaded the script to somewhere in your PATH, simply run as follows:

run_favites_docker.py [-h] -c CONFIG [-v]
  -h, --help                   show help message and exit
  -c CONFIG, --config CONFIG   Configuration file
  -v, --verbose                Print verbose messages to stderr (default: False)

Note that, to use FAVITES, you must have a configuration file (CONFIG in the usage message) in the format we require. We have included an example configuration file as well as examples of other FAVITES files in the example folder. Refer to the File Formats section of the Wiki for more information.

Output Folder Structure

Some module implementations output extra files in the root of the output folder, but the following items can be found in most use-cases:

  • CONFIG.json: A copy of the configuration file used in this execution of FAVITES (for reproducibility)
  • contact_network.txt: The contact network that was simulated by FAVITES
  • error_free_files: A directory containing all output of the error-free portion of the workflow
    • error_free_files/phylogenetic_trees: A directory containing all error-free phylogenetic trees (see File Formats for identifier information)
      • error_free_files/phylogenetic_trees/tree_#.time.tre: The error-free phylogenetic tree of the #-th transmission chain, in unit of time
      • error_free_files/phylogenetic_trees/tree_#.tre: The error-free phylogenetic tree of the #-th transmission chain, in unit of expected number of per-site mutations
      • error_free_files/phylogenetic_trees/merged_tree_#.time.tre: The #-th error-free merged phylogenetic tree, in unit of time
      • error_free_files/phylogenetic_trees/merged_tree_#.tre: The #-th error-free merged phylogenetic tree, in unit of expected number of per-site mutations
    • error_free_files/sequence_data.fasta: The complete set of all sampled viral sequences (see File Formats for identifier information)
    • error_free_files/transmission_network.gexf: The complete transmission network in the GEXF format
    • error_free_files/transmission_network.txt: The complete transmission network in the FAVITES edge list format
  • error_prone_files: A directory containing all output of the error-prone portion of the workflow
    • error_prone_files/sequence_data_subsampled_errorfree.fasta: The sequencing dataset with subsampling errors imposed, but without sequencing errors in the FASTA format
    • error_prone_files/sequence_data_subsampled_errorprone*.fastq: The sequencing dataset with both subsampling and sequencing errors imposed in the FASTQ format
      • Some Sequencing module implementations will have multiple output FASTQ files, e.g. paired-end sequencing will have one FASTQ file for forward reads and another for reverse reads

Designing a Configuration File

The configuration file is what dictates the parameters of a given simulation experiment. As mentioned in the File Formats section of the Wiki, the configuration file is a JSON file in which keys are module or module parameter names, and values are the desired value for the given module or module parameter. Because there are so many options, it might be difficult to design the entire configuration file at once. Instead, I recommend the following process:

  1. Start with an empty JSON file (i.e., a file only containing {})
  2. Try running FAVITES, and it will complain about a specific missing key (either a module or a module parameter)
  3. Look at the Modules page, determine the appropriate choice for the missing key based on your experiment design, and update the configuration file to include your choice for the specified missing key
  4. Repeat steps 2 and 3 until FAVITES no longer complains (all keys are specified)

Helper Scripts and Post-Validation

In addition to the main FAVITES workflow, we have included numerous tools that may be helpful for running FAVITES and interpreting the results. We have a series of Helper Scripts that perform various basic miscellaneous tasks (e.g. converting between filetypes, computing interpretable statistics on files, etc.) as well as a series of Post-Validation tools that allow you to gauge how realistic your simulated output looks (e.g. comparing simulated phylogenetic trees against a realistic tree, comparing simulated sequence data against real sequences, etc.).

Network Visualization

FAVITES outputs the transmission network in two formats: the FAVITES format and the GEXF format. To visualize the transmission network, we recommend opening the GEXF transmission network in Gephi, a popular cross-platform open-source tool. The GEXF transmission network FAVITES outputs is a dynamic network, meaning you can use Gephi to visualize the growth of the transmission network over time.

In the GEXF file, each node is given a single attribute, "infected," which is set to false at time 0 and is set to true upon infection, and each edge is given a single attribute, "transmission," which is set to false at time 0 and is set to true upon a transmission event along that edge. Simply load the GEXF file outputted by FAVITES in Gephi, go to the "Appearance" box, go to the "Nodes" tab to set the desired colors for infected and uninfected (which would be "true" and "false" for the "infected" attribute, respectively), go to the "Edges" tab to set the desired colors for transmission edges and normal contact network edges (which would be "true" and "false" for the "transmission" attribute, respectively), and set "Enable auto transformation - applied continuously" to have the colors change automatically in the timeline view.