diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index b31106b..9a0fc86 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.3","generation_timestamp":"2024-05-16T12:53:11","documenter_version":"1.4.1"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-06-13T15:04:05","documenter_version":"1.4.1"}} \ No newline at end of file diff --git a/dev/IO/index.html b/dev/IO/index.html index e649e64..43e3419 100644 --- a/dev/IO/index.html +++ b/dev/IO/index.html @@ -1,2 +1,2 @@ -Input/Output · MolecularEvolution.jl

Input/Output

MolecularEvolution.write_nexusFunction
write_nexus(fname::String,tree::FelNode)

Writes the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.

source
MolecularEvolution.populate_tree!Function
populate_tree!(tree::FelNode, starting_message, names, data; init_all_messages = true, tolerate_missing = 1)

Takes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if

  • tolerate_missing = 0, an error will be thrown
  • tolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)
  • tolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)
source
MolecularEvolution.write_fastaFunction
write_fasta(filepath::String, sequences::Vector{String}; seq_names = nothing)

Writes a fasta file from a vector of sequences, with optional seq_names.

source
+Input/Output · MolecularEvolution.jl

Input/Output

MolecularEvolution.write_nexusFunction
write_nexus(fname::String,tree::FelNode)

Writes the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.

source
MolecularEvolution.populate_tree!Function
populate_tree!(tree::FelNode, starting_message, names, data; init_all_messages = true, tolerate_missing = 1)

Takes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if

  • tolerate_missing = 0, an error will be thrown
  • tolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)
  • tolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)
source
MolecularEvolution.write_fastaFunction
write_fasta(filepath::String, sequences::Vector{String}; seq_names = nothing)

Writes a fasta file from a vector of sequences, with optional seq_names.

source
diff --git a/dev/ancestors/index.html b/dev/ancestors/index.html index 1388224..15f47b1 100644 --- a/dev/ancestors/index.html +++ b/dev/ancestors/index.html @@ -66,4 +66,4 @@ 0.0305 - true value: 0.0177 0.0913 - true value: 0.0485 0.0542 - true value: 0.075 -0.498 - true value: 0.589

Functions

MolecularEvolution.marginal_state_dictFunction
marginal_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())

Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.

source
MolecularEvolution.cascading_max_state_dictFunction
cascading_max_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())

Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.

source
MolecularEvolution.endpoint_conditioned_sample_state_dictFunction
endpoint_conditioned_sample_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())

Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.

source
+0.498 - true value: 0.589

Functions

MolecularEvolution.marginal_state_dictFunction
marginal_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())

Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.

source
MolecularEvolution.cascading_max_state_dictFunction
cascading_max_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())

Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.

source
MolecularEvolution.endpoint_conditioned_sample_state_dictFunction
endpoint_conditioned_sample_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())

Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.

source
diff --git a/dev/examples/index.html b/dev/examples/index.html index 2257c0b..4836ccb 100644 --- a/dev/examples/index.html +++ b/dev/examples/index.html @@ -371,4 +371,4 @@ end end
Site 153: P(β>α)=0.9074
 Site 158: P(β>α)=0.9266
-Site 160: P(β>α)=0.9547

And let's visualize one of those sites:

gridplot(alpha_ind_vec,beta_ind_vec,grid_values, weighted_mat[:,160]./sum(weighted_mat[:,160]))

+Site 160: P(β>α)=0.9547

And let's visualize one of those sites:

gridplot(alpha_ind_vec,beta_ind_vec,grid_values, weighted_mat[:,160]./sum(weighted_mat[:,160]))

diff --git a/dev/framework/index.html b/dev/framework/index.html index 576bda7..4176937 100644 --- a/dev/framework/index.html +++ b/dev/framework/index.html @@ -1,2 +1,2 @@ -The MolecularEvolution.jl Framework · MolecularEvolution.jl

The MolecularEvolution.jl Framework

The organizing principle is that the core algorithms, including Felsenstein's algorithm, but also a related family of message passing algorithms and inference machinery, are implemented in a way that does not refer to any specific model or even to any particular data type.

Partitions and BranchModels

A Partition is a probabilistic representation of some kind of state. Specifically, it needs to be able to represent P(obs|state) and P(obs,state) when considered as functions of state. So it will typically be able to assign a probability to any possible value of state, and is unnormalized - not required to sum or integrate to 1 over all values of state. As an example, for a discrete state with 4 categories, this could just be a vector of 4 numbers.

For a Partition type to be usable by MolecularEvolution.jl, the combine! function needs to be implemented. If you have P(obsA|state) and P(obsB|state), then combine! calculates P(obsA,obsB|state) under the assumption that obsA and obsB are conditionally independent given state. MolecularEvolution.jl tries to avoid allocating memory, so combine!(dest,src) places in dest the combined Partition in dest. For a discrete state with 4 categories, this is simply element-wise multiplication of two state vectors.

A BranchModel defines how Partition distributions evolve along branches. Two functions need to be implemented: backward! and forward!. We imagine our trees with the root at the top, and forward! moves from root to tip, and backward! moves from tip to root. backward!(dest::P,src::P,m::BranchModel,n::FelNode) takes a src Partition, representing P(obs-below|state-at-bottom-of-branch), and modifies the dest Partition to be P(obs-below|state-at-top-of-branch), where the branch in question is the branch above the FelNode n. forward! goes in the opposite direction, from P(obs-above,state-at-top-of-branch) to P(obs-above,state-at-bottom-of-branch), with the Partitions now, confusingly, representing joint distributions.

Messages

Nodes on our trees work with messages, where a message is a vector of Partition structs. This is in case you wish to model multiple different data types on the same tree. Often, all the messages on the tree will just be arrays containing a single Partition, but if you're accessing them you need to remember that they're in an array!

Trees

Each node in our tree is a FelNode ("Fel" for "Felsenstein"). They point to their parent nodes, and an array of their children, and they store their main vector of Partitions, but also cached versions of those from their parents and children, to allow certain message passing schemes. They also have a branchlength field, which tells eg. forward! and backward! how much evolution occurs along the branch above (ie. closer to the root) that node. They also allow for an arbitrary dictionary of node_data, in case a model needs any other branch-specific parameters.

The set of algorithms needs to know which model to use for which partition, so the assumption made is that they'll see an array of models whose order will match the partition array. In general, we might want the models to vary from one branch to another, so the central algorithms take a function that associates a FelNode->Vector{:<BranchModel}. In the simpler cases where the model does not vary from branch to branch, or where there is only a single Partition, and thus a single model, the core algorithms have been overloaded to allow you to pass in a single model vector or a single model.

Algorithms

Felsenstein's algorithm recursively computes, for each node, the probability of all observations below that node, given the state at that node. Felsenstein's algorithm can be decomposed into the following combination of backward! and combine! operations:

At the root node, we wind up with $P(O_{all}|R)$, where $R$ is the state at the root, and we can compute $P(O_{all}) = \sum_{R} P(O_{all}|R) P(R)$.

Technicalities

Scaling constants

Coming soon.

Root state

Coming soon.

Functions

MolecularEvolution.combine!Function
combine!(dest::P, src::P) where P<:Partition

Combines evidence from two partitions of the same type, storing the result in dest. Note: You should overload this for your own Partititon types.

source
MolecularEvolution.forward!Function
forward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)

Propagate the source partition forwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.

source
MolecularEvolution.backward!Function
backward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)

Propagate the source partition backwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.

source
+The MolecularEvolution.jl Framework · MolecularEvolution.jl

The MolecularEvolution.jl Framework

The organizing principle is that the core algorithms, including Felsenstein's algorithm, but also a related family of message passing algorithms and inference machinery, are implemented in a way that does not refer to any specific model or even to any particular data type.

Partitions and BranchModels

A Partition is a probabilistic representation of some kind of state. Specifically, it needs to be able to represent P(obs|state) and P(obs,state) when considered as functions of state. So it will typically be able to assign a probability to any possible value of state, and is unnormalized - not required to sum or integrate to 1 over all values of state. As an example, for a discrete state with 4 categories, this could just be a vector of 4 numbers.

For a Partition type to be usable by MolecularEvolution.jl, the combine! function needs to be implemented. If you have P(obsA|state) and P(obsB|state), then combine! calculates P(obsA,obsB|state) under the assumption that obsA and obsB are conditionally independent given state. MolecularEvolution.jl tries to avoid allocating memory, so combine!(dest,src) places in dest the combined Partition in dest. For a discrete state with 4 categories, this is simply element-wise multiplication of two state vectors.

A BranchModel defines how Partition distributions evolve along branches. Two functions need to be implemented: backward! and forward!. We imagine our trees with the root at the top, and forward! moves from root to tip, and backward! moves from tip to root. backward!(dest::P,src::P,m::BranchModel,n::FelNode) takes a src Partition, representing P(obs-below|state-at-bottom-of-branch), and modifies the dest Partition to be P(obs-below|state-at-top-of-branch), where the branch in question is the branch above the FelNode n. forward! goes in the opposite direction, from P(obs-above,state-at-top-of-branch) to P(obs-above,state-at-bottom-of-branch), with the Partitions now, confusingly, representing joint distributions.

Messages

Nodes on our trees work with messages, where a message is a vector of Partition structs. This is in case you wish to model multiple different data types on the same tree. Often, all the messages on the tree will just be arrays containing a single Partition, but if you're accessing them you need to remember that they're in an array!

Trees

Each node in our tree is a FelNode ("Fel" for "Felsenstein"). They point to their parent nodes, and an array of their children, and they store their main vector of Partitions, but also cached versions of those from their parents and children, to allow certain message passing schemes. They also have a branchlength field, which tells eg. forward! and backward! how much evolution occurs along the branch above (ie. closer to the root) that node. They also allow for an arbitrary dictionary of node_data, in case a model needs any other branch-specific parameters.

The set of algorithms needs to know which model to use for which partition, so the assumption made is that they'll see an array of models whose order will match the partition array. In general, we might want the models to vary from one branch to another, so the central algorithms take a function that associates a FelNode->Vector{:<BranchModel}. In the simpler cases where the model does not vary from branch to branch, or where there is only a single Partition, and thus a single model, the core algorithms have been overloaded to allow you to pass in a single model vector or a single model.

Algorithms

Felsenstein's algorithm recursively computes, for each node, the probability of all observations below that node, given the state at that node. Felsenstein's algorithm can be decomposed into the following combination of backward! and combine! operations:

At the root node, we wind up with $P(O_{all}|R)$, where $R$ is the state at the root, and we can compute $P(O_{all}) = \sum_{R} P(O_{all}|R) P(R)$.

Technicalities

Scaling constants

Coming soon.

Root state

Coming soon.

Functions

MolecularEvolution.combine!Function
combine!(dest::P, src::P) where P<:Partition

Combines evidence from two partitions of the same type, storing the result in dest. Note: You should overload this for your own Partititon types.

source
MolecularEvolution.forward!Function
forward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)

Propagate the source partition forwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.

source
MolecularEvolution.backward!Function
backward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)

Propagate the source partition backwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.

source
diff --git a/dev/index.html b/dev/index.html index 6caf2b1..af6d6d6 100644 --- a/dev/index.html +++ b/dev/index.html @@ -9,28 +9,29 @@ sample_down!(tree, bm_model) #And plot the log likelihood as a function of the parameter value ll(x) = log_likelihood!(tree,BrownianMotion(0.0,x)) -plot(0.7:0.001:1.6,ll, xlabel = "variance per unit time", ylabel = "log likelihood")

Base.:==Method
==(t1, t2)
-Defaults to pointer equality
source
MolecularEvolution.SWM_prob_gridMethod
SWM_prob_grid(part::SWMPartition{PType}) where {PType <: MultiSitePartition}

Returns a matrix of probabilities for each site, for each model (in the probability domain - not logged!) as well as the log probability offsets

source
MolecularEvolution._mapreduceMethod

Internal function. Helper for bfsmapreduce and dfsmapreduce

source
MolecularEvolution.backward!Method
backward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)

Propagate the source partition backwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.

source
MolecularEvolution.bfs_mapreduceMethod

Performs a BFS map-reduce over the tree, starting at a given node For each node, mapreduce is called as: mapreduce(currnode::FelNode, prevnode::FelNode, aggregator) where prev_node is the previous node visited on the path from the start node to the current node It is expected to update the aggregator, and not return anything.

Not exactly conventional map-reduce, as map-reduce calls may rely on state in the aggregator added by map-reduce calls on other nodes visited earlier.

source
MolecularEvolution.branchlength_optim!Method
branchlength_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5, bl_optimizer::UnivariateOpt = GoldenSectionOpt())

Uses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models). tol is the absolute tolerance for the bloptimizer which defaults to golden section search, and has Brent's method as an option by setting bl_optimizer=BrentsMethodOpt().

source
MolecularEvolution.brents_method_minimizeMethod
brents_method_minimize(f, a::Real, b::Real, transform, t::Real; ε::Real=sqrt(eps()))

Brent's method for minimization.

Given a function f with a single local minimum in the interval (a,b), Brent's method returns an approximation of the x-value that minimizes f to an accuaracy between 2tol and 3tol, where tol is a combination of a relative and an absolute tolerance, tol := ε|x| + t. ε should be no smaller 2*eps, and preferably not much less than sqrt(eps), which is also the default value. eps is defined here as the machine epsilon in double precision. t should be positive.

The method combines the stability of a Golden Section Search and the superlinear convergence Successive Parabolic Interpolation has under certain conditions. The method never converges much slower than a Fibonacci search and for a sufficiently well-behaved f, convergence can be exptected to be superlinear, with an order that's usually atleast 1.3247...

Examples

julia> f(x) = exp(-x) - cos(x)
+plot(0.7:0.001:1.6,ll, xlabel = "variance per unit time", ylabel = "log likelihood")

MolecularEvolution.LazyDownType

Constructors

LazyDown(stores_obs)
+LazyDown() = LazyDown(x::FelNode -> true)

Description

Indicate that we want to do a downward pass, e.g. sample_down!. The function passed to the constructor takes a node::FelNode as input and returns a Bool that decides if node stores its observations.

source
MolecularEvolution.LazyPartitionType

Constructor

LazyPartition{PType}()

Initialize an empty LazyPartition that is meant for wrapping a partition of type PType.

Description

With this data structure, you can wrap a partition of choice. The idea is that in some message passing algorithms, there is only a wave of partitions which need to actualize. For instance, a wave following a root-leaf path, or a depth-first traversal. In which case, we can be more economical with our memory consumption. With a worst case memory complexity of O(log(n)), where n is the number of nodes, functionality is provided for:

  • log_likelihood!
  • felsenstein!
  • sample_down!
Note

For successive felsenstein! calls, we need to extract the information at the root somehow after each call. This can be done with e.g. total_LL or site_LLs.

Further requirements

Suppose you want to wrap a partition of PType with LazyPartition:

  • If you're calling log_likelihood! and felsenstein!:
    • obs2partition!(partition::PType, obs) that transforms an observation to a partition.
  • If you're calling sample_down!:
    • partition2obs(partition::PType) that returns the most likely state from a partition, inverts obs2partition!.
source
MolecularEvolution.SWM_prob_gridMethod
SWM_prob_grid(part::SWMPartition{PType}) where {PType <: MultiSitePartition}

Returns a matrix of probabilities for each site, for each model (in the probability domain - not logged!) as well as the log probability offsets

source
MolecularEvolution.backward!Method
backward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)

Propagate the source partition backwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.

source
MolecularEvolution.bfs_mapreduceMethod

Performs a BFS map-reduce over the tree, starting at a given node For each node, mapreduce is called as: mapreduce(currnode::FelNode, prevnode::FelNode, aggregator) where prev_node is the previous node visited on the path from the start node to the current node It is expected to update the aggregator, and not return anything.

Not exactly conventional map-reduce, as map-reduce calls may rely on state in the aggregator added by map-reduce calls on other nodes visited earlier.

source
MolecularEvolution.branchlength_optim!Method
branchlength_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5, bl_optimizer::UnivariateOpt = GoldenSectionOpt())

Uses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models). tol is the absolute tolerance for the bloptimizer which defaults to golden section search, and has Brent's method as an option by setting bl_optimizer=BrentsMethodOpt().

source
MolecularEvolution.brents_method_minimizeMethod
brents_method_minimize(f, a::Real, b::Real, transform, t::Real; ε::Real=sqrt(eps()))

Brent's method for minimization.

Given a function f with a single local minimum in the interval (a,b), Brent's method returns an approximation of the x-value that minimizes f to an accuaracy between 2tol and 3tol, where tol is a combination of a relative and an absolute tolerance, tol := ε|x| + t. ε should be no smaller 2*eps, and preferably not much less than sqrt(eps), which is also the default value. eps is defined here as the machine epsilon in double precision. t should be positive.

The method combines the stability of a Golden Section Search and the superlinear convergence Successive Parabolic Interpolation has under certain conditions. The method never converges much slower than a Fibonacci search and for a sufficiently well-behaved f, convergence can be exptected to be superlinear, with an order that's usually atleast 1.3247...

Examples

julia> f(x) = exp(-x) - cos(x)
 f (generic function with 1 method)
 
 julia> m = brents_method_minimize(f, -1, 2, identity, 1e-7)
-0.5885327257940255

From: Richard P. Brent, "Algorithms for Minimization without Derivatives" (1973). Chapter 5.

source
MolecularEvolution.cascading_max_state_dictMethod
cascading_max_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())

Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.

source
MolecularEvolution.char_proportionsMethod
char_proportions(seqs, alphabet::String)

Takes a vector of sequences and returns a vector of the proportion of each character across all sequences. An example alphabet argument is MolecularEvolution.AAstring.

source
MolecularEvolution.colored_seq_drawMethod
colored_seq_draw(x, y, str::AbstractString; color_dict=Dict(), font_size=8pt, posx=hcenter, posy=vcenter)

Draw an arbitrary sequence. color_dict gives a mapping from characters to colors (default black). Default options for nucleotide colorings and amino acid colorings are given in the constants NUC_COLORS and AA_COLORS. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.

source
MolecularEvolution.combine!Method
combine!(dest::P, src::P) where P<:Partition

Combines evidence from two partitions of the same type, storing the result in dest. Note: You should overload this for your own Partititon types.

source
MolecularEvolution.deepequalsMethod
deepequals(t1, t2)

Checks whether two trees are equal by recursively calling this on all fields, except :parent, in order to prevent cycles. In order to ensure that the :parent field is not hiding something different on both trees, ensure that each is consistent first (see: istreeconsistent).

source
MolecularEvolution.discrete_name_color_dictMethod
discrete_name_color_dict(newt::AbstractTreeNode,tag_func; rainbow = false, scramble = false, darken = true, col_seed = nothing)

Takes a tree and a tag_func, which converts the leaf label into a category (ie. there should be <20 of these), and returns a color dictionary that can be used to color the leaves or bubbles.

Example tagfunc: function tagfunc(nam::String) return split(nam,"_")[1] end

For prettier colors, but less discrimination: rainbow = true To randomize the rainbow color assignment: scramble = true col_seed is currently set to white, and excluded from the list of colors, to make them more visible.

Consider making your own version of this function to customize colors as you see fit.

Example use: numleaves = 50 Nefunc(t) = 1*(e^-t).+5.0 newt = simtree(numleaves,Nefunc,1.0,nstart = rand(1:numleaves)); newt = ladderize(newt) tagfunc(nam) = mod(sum(Int.(collect(nam))),7) dic = discretenamecolordict(newt,tagfunc,rainbow = true); treedraw(newt,linewidth = 0.5mm,labelcolor_dict = dic)

source
MolecularEvolution.endpoint_conditioned_sample_state_dictMethod
endpoint_conditioned_sample_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())

Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.

source
MolecularEvolution.felsenstein!Method
felsenstein!(node::FelNode, models; partition_list = nothing)

Should usually be called on the root of the tree. Propagates Felsenstein pass up from the tips to the root. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.felsenstein_down!Method
felsenstein_down!(node::FelNode, models; partition_list = 1:length(tree.message), temp_message = copy_message(tree.message))

Should usually be called on the root of the tree. Propagates Felsenstein pass down from the root to the tips. felsenstein!() should usually be called first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.forward!Method
forward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)

Propagate the source partition forwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.

source
MolecularEvolution.gappy_Q_from_symmetric_rate_matrixMethod
gappy_Q_from_symmetric_rate_matrix(sym_mat, gap_rate, eq_freqs)

Takes a symmetric rate matrix and gap rate (governing mutations to and from gaps) and returns a gappy rate matrix. The equilibrium frequencies are multiplied on column-wise.

source
MolecularEvolution.get_phylo_treeMethod
get_phylo_tree(molev_root::FelNode; data_function = (x -> Tuple{String,Float64}[]))

Converts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.

source
MolecularEvolution.golden_section_maximizeMethod

Golden section search.

Given a function f with a single local minimum in the interval [a,b], gss returns a subset interval [c,d] that contains the minimum with d-c <= tol.

Examples

julia> f(x) = -(x-2)^2
+0.5885327257940255

From: Richard P. Brent, "Algorithms for Minimization without Derivatives" (1973). Chapter 5.

source
MolecularEvolution.cascading_max_state_dictMethod
cascading_max_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())

Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.

source
MolecularEvolution.char_proportionsMethod
char_proportions(seqs, alphabet::String)

Takes a vector of sequences and returns a vector of the proportion of each character across all sequences. An example alphabet argument is MolecularEvolution.AAstring.

source
MolecularEvolution.colored_seq_drawMethod
colored_seq_draw(x, y, str::AbstractString; color_dict=Dict(), font_size=8pt, posx=hcenter, posy=vcenter)

Draw an arbitrary sequence. color_dict gives a mapping from characters to colors (default black). Default options for nucleotide colorings and amino acid colorings are given in the constants NUC_COLORS and AA_COLORS. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.

source
MolecularEvolution.combine!Method
combine!(dest::P, src::P) where P<:Partition

Combines evidence from two partitions of the same type, storing the result in dest. Note: You should overload this for your own Partititon types.

source
MolecularEvolution.deepequalsMethod
deepequals(t1, t2)

Checks whether two trees are equal by recursively calling this on all fields, except :parent, in order to prevent cycles. In order to ensure that the :parent field is not hiding something different on both trees, ensure that each is consistent first (see: istreeconsistent).

source
MolecularEvolution.discrete_name_color_dictMethod
discrete_name_color_dict(newt::AbstractTreeNode,tag_func; rainbow = false, scramble = false, darken = true, col_seed = nothing)

Takes a tree and a tag_func, which converts the leaf label into a category (ie. there should be <20 of these), and returns a color dictionary that can be used to color the leaves or bubbles.

Example tagfunc: function tagfunc(nam::String) return split(nam,"_")[1] end

For prettier colors, but less discrimination: rainbow = true To randomize the rainbow color assignment: scramble = true col_seed is currently set to white, and excluded from the list of colors, to make them more visible.

Consider making your own version of this function to customize colors as you see fit.

Example use: numleaves = 50 Nefunc(t) = 1*(e^-t).+5.0 newt = simtree(numleaves,Nefunc,1.0,nstart = rand(1:numleaves)); newt = ladderize(newt) tagfunc(nam) = mod(sum(Int.(collect(nam))),7) dic = discretenamecolordict(newt,tagfunc,rainbow = true); treedraw(newt,linewidth = 0.5mm,labelcolor_dict = dic)

source
MolecularEvolution.endpoint_conditioned_sample_state_dictMethod
endpoint_conditioned_sample_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())

Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.

source
MolecularEvolution.felsenstein!Method
felsenstein!(node::FelNode, models; partition_list = nothing)

Should usually be called on the root of the tree. Propagates Felsenstein pass up from the tips to the root. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.felsenstein_down!Method
felsenstein_down!(node::FelNode, models; partition_list = 1:length(tree.message), temp_message = copy_message(tree.message))

Should usually be called on the root of the tree. Propagates Felsenstein pass down from the root to the tips. felsenstein!() should usually be called first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.forward!Method
forward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)

Propagate the source partition forwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.

source
MolecularEvolution.gappy_Q_from_symmetric_rate_matrixMethod
gappy_Q_from_symmetric_rate_matrix(sym_mat, gap_rate, eq_freqs)

Takes a symmetric rate matrix and gap rate (governing mutations to and from gaps) and returns a gappy rate matrix. The equilibrium frequencies are multiplied on column-wise.

source
MolecularEvolution.get_phylo_treeMethod
get_phylo_tree(molev_root::FelNode; data_function = (x -> Tuple{String,Float64}[]))

Converts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.

source
MolecularEvolution.golden_section_maximizeMethod

Golden section search.

Given a function f with a single local minimum in the interval [a,b], gss returns a subset interval [c,d] that contains the minimum with d-c <= tol.

Examples

julia> f(x) = -(x-2)^2
 f (generic function with 1 method)
 
 julia> m = golden_section_maximize(f, 1, 5, identity, 1e-10)
-2.0000000000051843

From: https://en.wikipedia.org/wiki/Golden-section_search

source
MolecularEvolution.highlight_seq_drawMethod
highlight_seq_draw(x, y, str::AbstractString, region, basecolor, hicolor; fontsize=8pt, posx=hcenter, posy=vcenter)

Draw a sequence, highlighting the sites given in region. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.

source
MolecularEvolution.highlight_seq_drawMethod
highlight_seq_draw(x, y, str::AbstractString, region, basecolor, hicolor; fontsize=8pt, posx=hcenter, posy=vcenter)

Draw a sequence, highlighting the sites given in region. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.

source
MolecularEvolution.highlighter_tree_drawMethod
highlighter_tree_draw(tree, ali_seqs, seqnames, master;
     highlighter_start = 1.1, highlighter_width = 1,
     coord_width = highlighter_start + highlighter_width + 0.1,
     scale_length = nothing, major_breaks = 1000, minor_breaks = 500,
-    tree_args = NamedTuple[], legend_padding = 0.5cm, legend_colors = NUC_colors)

Draws a combined tree and highlighter plot. The vector of seqnames must match the node names in tree.

kwargs:

  • treeargs: kwargs to pass to `treedraw()`
  • legendcolors: Mapping of characters to highlighter colors (default NTcolors)
  • scale_length: Length of the scale bar
  • highlighter_start: Canvas start for the highlighter panel
  • highlighter_width: Canvas width for the highlighter panel
  • coord_width: Total width of the canvas
  • major_breaks: Numbered breaks for sequence axis
  • minor_breaks: Ticks for sequence axis
source
MolecularEvolution.internal_message_init!Method
internal_message_init!(tree::FelNode, partition::Partition)
+    tree_args = NamedTuple[], legend_padding = 0.5cm, legend_colors = NUC_colors)

Draws a combined tree and highlighter plot. The vector of seqnames must match the node names in tree.

kwargs:

  • treeargs: kwargs to pass to `treedraw()`
  • legendcolors: Mapping of characters to highlighter colors (default NTcolors)
  • scale_length: Length of the scale bar
  • highlighter_start: Canvas start for the highlighter panel
  • highlighter_width: Canvas width for the highlighter panel
  • coord_width: Total width of the canvas
  • major_breaks: Numbered breaks for sequence axis
  • minor_breaks: Ticks for sequence axis
source
MolecularEvolution.log_likelihood!Method
log_likelihood!(tree::FelNode, models; partition_list = nothing)

First re-computes the upward felsenstein pass, and then computes the log likelihood of this tree. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.log_likelihoodMethod
log_likelihood(tree::FelNode, models; partition_list = nothing)

Computed the log likelihood of this tree. Requires felsenstein!() to have been run. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.longest_pathMethod

Returns the longest path in a tree For convenience, this is returned as two lists of form: [leafnode, parentnode, .... root] Where the leaf_node nodes are selected to be the furthest away

source
MolecularEvolution.marginal_state_dictMethod
marginal_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())

Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.

source
MolecularEvolution.matrix_for_displayMethod
matrix_for_display(Q,labels)

Takes a numerical matrix and a vector of labels, and returns a typically mixed type matrix with the numerical values and the labels. This is to easily visualize rate matrices in eg. the REPL.

source
MolecularEvolution.mixMethod
mix(swm_part::SWMPartition{PType} ) where {PType <: MultiSitePartition}

mix collapses a Site-Wise Mixture partition to a single component partition, weighted by the site-wise likelihoods for each component, and the init weights. Specifically, it takes a SWMPartition{Ptype} and returns a PType. You'll need to have this implemented for certain helper functionality if you're playing with new kinds of SWMPartitions that aren't mixtures of DiscretePartitions.

source
MolecularEvolution.nni_optim!Method
nni_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5)

Considers local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models). accrule allows you to specify a function that takes the current and proposed log likelihoods, and if true is returned the move is accepted.

source
MolecularEvolution.partition2obsMethod
partition2obs(part::Partition)

Extracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.

source
MolecularEvolution.populate_tree!Method
populate_tree!(tree::FelNode, starting_message, names, data; init_all_messages = true, tolerate_missing = 1)

Takes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if

  • tolerate_missing = 0, an error will be thrown
  • tolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)
  • tolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)
source
MolecularEvolution.promote_internalMethod
promote_internal(tree::FelNode)

Creates a new tree similar to the given tree, but with 'dummy' leaf nodes (w/ zero branchlength) representing each internal node (for drawing / evenly spacing labels internal nodes).

source
MolecularEvolution.quadratic_CIMethod
quadratic_CI(f::Function,opt_params::Vector, param_ind::Int; rate_conf_level = 0.99, nudge_amount = 0.01)

Takes a NEGATIVE log likelihood function (compatible with Optim.jl), a vector of maximizing parameters, an a parameter index. Returns the quadratic confidence interval.

source
MolecularEvolution.quadratic_CIMethod
quadratic_CI(xvec,yvec; rate_conf_level = 0.99)

Takes xvec, a vector of parameter values, and yvec, a vector of log likelihood evaluations (note: NOT the negative LLs you) might use with Optim.jl. Returns the confidence intervals computed by a quadratic approximation to the LL.

source
MolecularEvolution.reversibleQMethod
reversibleQ(param_vec,eq_freqs)

Takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.

source
MolecularEvolution.root2tip_distancesMethod
root2tips(root::AbstractTreeNode)

Returns a vector of root-to-tip distances, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.

source
MolecularEvolution.sample_down!Method

sampledown!(root::FelNode,models,partitionlist)

Generates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.savefig_tweakSVGMethod
savefig_tweakSVG(fname, plot::Context; width = 10cm, height = 10cm, linecap_round = true, white_background = true)

Saves a figure created using the Compose approach, but tweaks the SVG after export.

eg. savefig_tweakSVG("export.svg",pl)

source
MolecularEvolution.savefig_tweakSVGMethod
savefig_tweakSVG(fname, plot::Plots.Plot; hack_bounding_box = true, new_viewbox = nothing, linecap_round = true)

Note: Might only work if you're using the GR backend!! Saves a figure created using the Phylo Plots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.

eg. savefig_tweakSVG("export.svg",pl, new_viewbox = [-100, -100, 3000, 4500])

source
MolecularEvolution.sim_treeMethod
sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)

Simulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.

Nefunc(t) = (sin(t/10)+1)*100.0 + 10.0 root = simtree(600,Nefunc,1.0) simpletree_draw(ladderize(root))

source
MolecularEvolution.simple_radial_tree_plotMethod
simple_radial_tree_plot(root::FelNode; canvas_width = 10cm, line_color = "black", line_width = 0.1mm)

Draws a radial tree. No frills. No labels. Canvas height is automatically determined to avoid distorting the tree.

newt = betternewickimport("((A:1,B:1,C:1,D:1,E:1,F:1,G:1):1,(H:1,I:1):1);", FelNode{Float64}); simpleradialtreeplot(newt,linewidth = 0.5mm,root_angle = 7/10)

source
MolecularEvolution.simple_tree_drawMethod

img = simpletreedraw(tree::FelNode; canvaswidth = 15cm, canvasheight = 15cm, linecolor = "black", linewidth = 0.1mm)

A line drawing of a tree with very few options.

img = simple_tree_draw(tree)
+Initializes the message template for each node in the tree, allocating space for each partition.
source
MolecularEvolution.lazyprep!Method
lazyprep!(tree::FelNode, initial_message::Vector{<:Partition}; partition_list = 1:length(tree.message), direction::LazyDirection = LazyUp())

Extra, intermediate step of tree preparations between initializing messages across the tree and calling message passing algorithms with LazyPartition.

  1. Perform a lazysort! on tree to obtain the optimal tree for a lazy felsenstein! prop, or a sample_down!.
  2. Fix tree.parent_message to an initial message.
  3. Preallocate sufficiently many inner partitions needed for a felsenstein! prop, or a sample_down!.
  4. Specialized preparations based on the direction of the operations (forward!, backward!). LazyDown or LazyUp.

See also LazyDown, LazyUp.

source
MolecularEvolution.lazysort!Method
  • Should be run on a tree containing LazyPartitions before running felsenstein!. Sorts for a minimal count of active partitions during a felsenstein!
  • Returns the minimum length of memoryblocks (-1) required for a felsenstein! prop. We need a temporary memoryblock during backward!, hence the '-1'.
Note

Since felsenstein! uses a stack, we want to avoid having long node.children[1].children[1]... chains

source
MolecularEvolution.log_likelihood!Method
log_likelihood!(tree::FelNode, models; partition_list = nothing)

First re-computes the upward felsenstein pass, and then computes the log likelihood of this tree. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.log_likelihoodMethod
log_likelihood(tree::FelNode, models; partition_list = nothing)

Computed the log likelihood of this tree. Requires felsenstein!() to have been run. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.longest_pathMethod

Returns the longest path in a tree For convenience, this is returned as two lists of form: [leafnode, parentnode, .... root] Where the leaf_node nodes are selected to be the furthest away

source
MolecularEvolution.marginal_state_dictMethod
marginal_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())

Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.

source
MolecularEvolution.matrix_for_displayMethod
matrix_for_display(Q,labels)

Takes a numerical matrix and a vector of labels, and returns a typically mixed type matrix with the numerical values and the labels. This is to easily visualize rate matrices in eg. the REPL.

source
MolecularEvolution.mixMethod
mix(swm_part::SWMPartition{PType} ) where {PType <: MultiSitePartition}

mix collapses a Site-Wise Mixture partition to a single component partition, weighted by the site-wise likelihoods for each component, and the init weights. Specifically, it takes a SWMPartition{Ptype} and returns a PType. You'll need to have this implemented for certain helper functionality if you're playing with new kinds of SWMPartitions that aren't mixtures of DiscretePartitions.

source
MolecularEvolution.nni_optim!Method
nni_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5)

Considers local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models). accrule allows you to specify a function that takes the current and proposed log likelihoods, and if true is returned the move is accepted.

source
MolecularEvolution.partition2obsMethod
partition2obs(part::Partition)

Extracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.

source
MolecularEvolution.populate_tree!Method
populate_tree!(tree::FelNode, starting_message, names, data; init_all_messages = true, tolerate_missing = 1)

Takes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if

  • tolerate_missing = 0, an error will be thrown
  • tolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)
  • tolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)
source
MolecularEvolution.promote_internalMethod
promote_internal(tree::FelNode)

Creates a new tree similar to the given tree, but with 'dummy' leaf nodes (w/ zero branchlength) representing each internal node (for drawing / evenly spacing labels internal nodes).

source
MolecularEvolution.quadratic_CIMethod
quadratic_CI(f::Function,opt_params::Vector, param_ind::Int; rate_conf_level = 0.99, nudge_amount = 0.01)

Takes a NEGATIVE log likelihood function (compatible with Optim.jl), a vector of maximizing parameters, an a parameter index. Returns the quadratic confidence interval.

source
MolecularEvolution.quadratic_CIMethod
quadratic_CI(xvec,yvec; rate_conf_level = 0.99)

Takes xvec, a vector of parameter values, and yvec, a vector of log likelihood evaluations (note: NOT the negative LLs you) might use with Optim.jl. Returns the confidence intervals computed by a quadratic approximation to the LL.

source
MolecularEvolution.reversibleQMethod
reversibleQ(param_vec,eq_freqs)

Takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.

source
MolecularEvolution.root2tip_distancesMethod
root2tips(root::AbstractTreeNode)

Returns a vector of root-to-tip distances, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.

source
MolecularEvolution.sample_down!Method

sampledown!(root::FelNode,models,partitionlist)

Generates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.savefig_tweakSVGMethod
savefig_tweakSVG(fname, plot::Context; width = 10cm, height = 10cm, linecap_round = true, white_background = true)

Saves a figure created using the Compose approach, but tweaks the SVG after export.

eg. savefig_tweakSVG("export.svg",pl)

source
MolecularEvolution.savefig_tweakSVGMethod
savefig_tweakSVG(fname, plot::Plots.Plot; hack_bounding_box = true, new_viewbox = nothing, linecap_round = true)

Note: Might only work if you're using the GR backend!! Saves a figure created using the Phylo Plots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.

eg. savefig_tweakSVG("export.svg",pl, new_viewbox = [-100, -100, 3000, 4500])

source
MolecularEvolution.sim_treeMethod
sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)

Simulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.

Nefunc(t) = (sin(t/10)+1)*100.0 + 10.0 root = simtree(600,Nefunc,1.0) simpletree_draw(ladderize(root))

source
MolecularEvolution.simple_radial_tree_plotMethod
simple_radial_tree_plot(root::FelNode; canvas_width = 10cm, line_color = "black", line_width = 0.1mm)

Draws a radial tree. No frills. No labels. Canvas height is automatically determined to avoid distorting the tree.

newt = betternewickimport("((A:1,B:1,C:1,D:1,E:1,F:1,G:1):1,(H:1,I:1):1);", FelNode{Float64}); simpleradialtreeplot(newt,linewidth = 0.5mm,root_angle = 7/10)

source
MolecularEvolution.simple_tree_drawMethod

img = simpletreedraw(tree::FelNode; canvaswidth = 15cm, canvasheight = 15cm, linecolor = "black", linewidth = 0.1mm)

A line drawing of a tree with very few options.

img = simple_tree_draw(tree)
 img |> SVG("imgout.svg",10cm, 10cm)
 OR
 using Cairo
-img |> PDF("imgout.pdf",10cm, 10cm)
source
MolecularEvolution.total_LLMethod

total_LL(p::Partition)

If called on the root, it returns the log likelihood associated with that partition. Can be overloaded for complex partitions without straightforward site log likelihoods.

source
MolecularEvolution.tree2distancesMethod
tree2distances(root::AbstractTreeNode)

Returns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.

source
MolecularEvolution.tree2shared_branch_lengthsMethod
tree2distances(root::AbstractTreeNode)

Returns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.

source
MolecularEvolution.total_LLMethod

total_LL(p::Partition)

If called on the root, it returns the log likelihood associated with that partition. Can be overloaded for complex partitions without straightforward site log likelihoods.

source
MolecularEvolution.tree2distancesMethod
tree2distances(root::AbstractTreeNode)

Returns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.

source
MolecularEvolution.tree2shared_branch_lengthsMethod
tree2distances(root::AbstractTreeNode)

Returns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.

source
MolecularEvolution.tree_drawMethod
tree_draw(tree::FelNode;
     canvas_width = 15cm, canvas_height = 15cm,
     stretch_for_labels = 2.0, draw_labels = true,
     line_width = 0.1mm, font_size = 4pt,
@@ -59,10 +60,10 @@
 img |> SVG("imgout.svg",10cm, 10cm)
 OR
 using Cairo
-img |> PDF("imgout.pdf",10cm, 10cm)
source
MolecularEvolution.tree_polish!Method

tree_polish!(newt, models; tol = 10^-4, verbose = 1, topology = true)

Takes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.

source
MolecularEvolution.unc2probvecMethod
unc2probvec(v)

Takes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.

source
MolecularEvolution.univariate_maximizeMethod
univariate_maximize(f, a::Real, b::Real, transform, optimizer::BrentsMethodOpt, t::Real; ε::Real=sqrt(eps))

Maximizes f(x) using Brent's method. See ?brents_method_minimize.

source
MolecularEvolution.univariate_maximizeMethod
univariate_maximize(f, a::Real, b::Real, transform, optimizer::GoldenSectionOpt, tol::Real)

Maximizes f(x) using a Golden Section Search. See ?golden_section_maximize.

Examples

julia> f(x) = -(x-2)^2
+img |> PDF("imgout.pdf",10cm, 10cm)
source
MolecularEvolution.tree_polish!Method

tree_polish!(newt, models; tol = 10^-4, verbose = 1, topology = true)

Takes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.

source
MolecularEvolution.unc2probvecMethod
unc2probvec(v)

Takes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.

source
MolecularEvolution.univariate_maximizeMethod
univariate_maximize(f, a::Real, b::Real, transform, optimizer::BrentsMethodOpt, t::Real; ε::Real=sqrt(eps))

Maximizes f(x) using Brent's method. See ?brents_method_minimize.

source
MolecularEvolution.univariate_maximizeMethod
univariate_maximize(f, a::Real, b::Real, transform, optimizer::GoldenSectionOpt, tol::Real)

Maximizes f(x) using a Golden Section Search. See ?golden_section_maximize.

Examples

julia> f(x) = -(x-2)^2
 f (generic function with 1 method)
 
 julia> m = univariate_maximize(f, 1, 5, identity, GoldenSectionOpt(), 1e-10)
-2.0000000000051843
source
MolecularEvolution.values_from_phylo_treeMethod
values_from_phylo_tree(phylo_tree, key)
 
-Returns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.
source
MolecularEvolution.weightEMMethod
weightEM(con_lik_matrix::Array{Float64,2}, θ; conc = 0.0, iters = 500)

Takes a conditional likelihood matrix (#categories-by-sites) and a starting frequency vector θ (length(θ) = #categories) and optimizes θ (using Expectation Maximization. Maybe.). If conc > 0 then this gives something like variational bayes behavior for LDA. Maybe.

source
MolecularEvolution.write_fastaMethod
write_fasta(filepath::String, sequences::Vector{String}; seq_names = nothing)

Writes a fasta file from a vector of sequences, with optional seq_names.

source
MolecularEvolution.write_nexusMethod
write_nexus(fname::String,tree::FelNode)

Writes the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.

source
+Returns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.source
MolecularEvolution.weightEMMethod
weightEM(con_lik_matrix::Array{Float64,2}, θ; conc = 0.0, iters = 500)

Takes a conditional likelihood matrix (#categories-by-sites) and a starting frequency vector θ (length(θ) = #categories) and optimizes θ (using Expectation Maximization. Maybe.). If conc > 0 then this gives something like variational bayes behavior for LDA. Maybe.

source
MolecularEvolution.write_fastaMethod
write_fasta(filepath::String, sequences::Vector{String}; seq_names = nothing)

Writes a fasta file from a vector of sequences, with optional seq_names.

source
MolecularEvolution.write_nexusMethod
write_nexus(fname::String,tree::FelNode)

Writes the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.

source
diff --git a/dev/models/index.html b/dev/models/index.html index 8274247..6681e7a 100644 --- a/dev/models/index.html +++ b/dev/models/index.html @@ -1,2 +1,12 @@ -Models · MolecularEvolution.jl
+Models · MolecularEvolution.jl

Models

Coming soon.

Discrete state models

Codon models

Continuous models

Compound models

Lazy models

LazyPartition

MolecularEvolution.LazyPartitionType

Constructor

LazyPartition{PType}()

Initialize an empty LazyPartition that is meant for wrapping a partition of type PType.

Description

With this data structure, you can wrap a partition of choice. The idea is that in some message passing algorithms, there is only a wave of partitions which need to actualize. For instance, a wave following a root-leaf path, or a depth-first traversal. In which case, we can be more economical with our memory consumption. With a worst case memory complexity of O(log(n)), where n is the number of nodes, functionality is provided for:

  • log_likelihood!
  • felsenstein!
  • sample_down!
Note

For successive felsenstein! calls, we need to extract the information at the root somehow after each call. This can be done with e.g. total_LL or site_LLs.

Further requirements

Suppose you want to wrap a partition of PType with LazyPartition:

  • If you're calling log_likelihood! and felsenstein!:
    • obs2partition!(partition::PType, obs) that transforms an observation to a partition.
  • If you're calling sample_down!:
    • partition2obs(partition::PType) that returns the most likely state from a partition, inverts obs2partition!.
source

Examples

Example 1: Initializing for an upward pass

Now, we show how to wrap the CodonPartitions from Example 3: FUBAR with LazyPartition:

You simply go from initializing messages like this:

initial_partition = CodonPartition(Int64(length(seqs[1])/3))
+initial_partition.state .= eq_freqs
+populate_tree!(tree,initial_partition,seqnames,seqs)

To this

initial_partition = CodonPartition(Int64(length(seqs[1])/3))
+initial_partition.state .= eq_freqs
+lazy_initial_partition = LazyPartition{CodonPartition}()
+populate_tree!(tree,lazy_initial_partition,seqnames,seqs)
+lazyprep!(tree, initial_partition)

By this slight modification, we go from initializing and using 554 partitions to 6 during the subsequent log_likelihood! and felsenstein! calls. There is no significant decrease in performance recorded from this switch.

Example 2: Initializing for a downward pass

Now, we show how to wrap the GaussianPartitions from Quick example: Likelihood calculations under phylogenetic Brownian motion: with LazyPartition:

You simply go from initializing messages like this:

internal_message_init!(tree, GaussianPartition())

To this (technically we only add 1 LOC)

initial_partition = GaussianPartition()
+lazy_initial_partition = LazyPartition{GaussianPartition}()
+internal_message_init!(tree, lazy_initial_partition)
+lazyprep!(tree, initial_partition, direction=LazyDown(isleafnode))
Note

Now, we provided a direction for lazyprep!. The direction is an instance of LazyDown, which was initialized with the isleafnode function. The function isleafnode dictates if a node saves its sampled observation after a down pass. If you use direction=LazyDown(), every node saves its observation.

Surrounding LazyPartition

MolecularEvolution.lazyprep!Function
lazyprep!(tree::FelNode, initial_message::Vector{<:Partition}; partition_list = 1:length(tree.message), direction::LazyDirection = LazyUp())

Extra, intermediate step of tree preparations between initializing messages across the tree and calling message passing algorithms with LazyPartition.

  1. Perform a lazysort! on tree to obtain the optimal tree for a lazy felsenstein! prop, or a sample_down!.
  2. Fix tree.parent_message to an initial message.
  3. Preallocate sufficiently many inner partitions needed for a felsenstein! prop, or a sample_down!.
  4. Specialized preparations based on the direction of the operations (forward!, backward!). LazyDown or LazyUp.

See also LazyDown, LazyUp.

source
MolecularEvolution.LazyDownType

Constructors

LazyDown(stores_obs)
+LazyDown() = LazyDown(x::FelNode -> true)

Description

Indicate that we want to do a downward pass, e.g. sample_down!. The function passed to the constructor takes a node::FelNode as input and returns a Bool that decides if node stores its observations.

source
diff --git a/dev/objects.inv b/dev/objects.inv index ed56e0f..ca1074d 100644 Binary files a/dev/objects.inv and b/dev/objects.inv differ diff --git a/dev/optimization/index.html b/dev/optimization/index.html index 6413d7d..7e5ef89 100644 --- a/dev/optimization/index.html +++ b/dev/optimization/index.html @@ -70,4 +70,4 @@ LL: -3782.322906364547 LL: -3782.321183009534 LL: -3782.3210398963506 -LL: -3782.3210271696703
Warning

tree_polish! probably won't find a good tree from a completely start. Different tree search heuristics are required for that.

Functions

MolecularEvolution.reversibleQFunction
reversibleQ(param_vec,eq_freqs)

Takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.

source
MolecularEvolution.unc2probvecFunction
unc2probvec(v)

Takes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.

source
MolecularEvolution.branchlength_optim!Function
branchlength_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5, bl_optimizer::UnivariateOpt = GoldenSectionOpt())

Uses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models). tol is the absolute tolerance for the bloptimizer which defaults to golden section search, and has Brent's method as an option by setting bl_optimizer=BrentsMethodOpt().

source
MolecularEvolution.nni_optim!Function
nni_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5)

Considers local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models). accrule allows you to specify a function that takes the current and proposed log likelihoods, and if true is returned the move is accepted.

source
MolecularEvolution.tree_polish!Function

tree_polish!(newt, models; tol = 10^-4, verbose = 1, topology = true)

Takes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.

source
+LL: -3782.3210271696703
Warning

tree_polish! probably won't find a good tree from a completely start. Different tree search heuristics are required for that.

Functions

MolecularEvolution.reversibleQFunction
reversibleQ(param_vec,eq_freqs)

Takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.

source
MolecularEvolution.unc2probvecFunction
unc2probvec(v)

Takes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.

source
MolecularEvolution.branchlength_optim!Function
branchlength_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5, bl_optimizer::UnivariateOpt = GoldenSectionOpt())

Uses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models). tol is the absolute tolerance for the bloptimizer which defaults to golden section search, and has Brent's method as an option by setting bl_optimizer=BrentsMethodOpt().

source
MolecularEvolution.nni_optim!Function
nni_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5)

Considers local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models). accrule allows you to specify a function that takes the current and proposed log likelihoods, and if true is returned the move is accepted.

source
MolecularEvolution.tree_polish!Function

tree_polish!(newt, models; tol = 10^-4, verbose = 1, topology = true)

Takes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.

source
diff --git a/dev/search_index.js b/dev/search_index.js index eef51c0..fcdfc8d 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"optimization/#Optimization","page":"Optimization","title":"Optimization","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"There are two distinct kinds of optimization: \"global\" model parameters, and then tree branchlengths and topology. These are kept distinct because we can use algorithmic tricks to dramatically improve the performance of the latter.","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"The example below will set up and optimize a \"Generalized Time Reversible\" nucleotide substitution model, where there are 6 rate parameters that govern the symmetric part of a rate matrix, and 4 nucleotide frequencies (that sum to 1, so only 3 underlying parameters).","category":"page"},{"location":"optimization/#Optimizing-model-parameters","page":"Optimization","title":"Optimizing model parameters","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"We first need to construct an objective function. A very common use case involves parameterizing a rate matrix (along with all the constraints this entails) from a flat parameter vector. reversibleQ can be convenient here, which takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle (excluding the diagonal) of the rate matrix:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"using MolecularEvolution #hide\nreversibleQ(1:6,ones(4))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"...and the equilibrium frequencies are multiplied column-wise:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"reversibleQ(ones(6),[0.1,0.2,0.3,0.4])","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Another convenient trick is to be able to parameterize a vector of positive frequencies that sum to 1, using N-1 unconstrained parameters. unc2probvec can help:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"unc2probvec(zeros(3))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"ParameterHandling.jl provides a convenient framework for managing collections of parameters in a way that plays with much of the Julia optimization ecosystem, and we recommend its use. Here we'll use ParameterHandling and NLopt.","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"First, we'll load in some example nucleotide data:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"using MolecularEvolution, FASTX, ParameterHandling, NLopt\n\n#Read in seqs and tree, and populate the three NucleotidePartitions\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\ninitial_partition = NucleotidePartition(length(seqs[1]))\npopulate_tree!(tree,initial_partition,seqnames,seqs)","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Then we set up the model parameters, and the objective function:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"#Named tuple of parameters, with initial values and constraints (from ParameterHandling.jl)\ninitial_params = (\n rates=positive(ones(6)), #rates must be non-negative\n pi=zeros(3) #will be transformed into 4 eq freqs\n)\nflat_initial_params, unflatten = value_flatten(initial_params) #See ParameterHandling.jl docs\nnum_params = length(flat_initial_params)\n\n#Set up a function that builds a model from these parameters\nfunction build_model_vec(params)\n pi = unc2probvec(params.pi)\n return DiagonalizedCTMC(reversibleQ(params.rates,pi))\nend\n\n#Set up the function to be *minimized*\nfunction objective(params::NamedTuple; tree = tree)\n #In this example, we are optimizing the nuc equilibrium freqs\n #We'll also assume that the starting frequencies (at the root of the tree) are the eq freqs\n tree.parent_message[1].state .= unc2probvec(params.pi)\n return -log_likelihood!(tree,build_model_vec(params)) #Note, negative of LL, because minimization\nend","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Then we'll set up an optimizer from NLOpt. See this discussion and this exploration of optimizers.","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"opt = Opt(:LN_BOBYQA, num_params)\n#Note: NLopt requires a function that returns a gradient, even for gradient free methods, hence (x,y)->...\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x)) #See ParameterHandling.jl docs for objective ∘ unflatten explanation\n#Some bounds (which will be in the transformed domain) to prevent searching numerically silly bits of parameter space:\nlower_bounds!(opt, [-10.0 for i in 1:num_params])\nupper_bounds!(opt, [10.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\n_,mini,_ = NLopt.optimize(opt, flat_initial_params)\nfinal_params = unflatten(mini)\n\noptimized_model = build_model_vec(final_params)\nprintln(\"Opt LL:\",log_likelihood!(tree,optimized_model))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Opt LL:-3783.226756522292","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"We can view the optimized parameter values:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"println(\"Rates: \", round.(final_params.rates,sigdigits = 4))\nprintln(\"Pi:\", round.(unc2probvec(final_params.pi),sigdigits = 4))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Rates: [1.124, 2.102, 1.075, 0.9802, 1.605, 0.5536]\nPi:[0.2796, 0.2192, 0.235, 0.2662]","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Or the entire optimized rate matrix:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"matrix_for_display(optimized_model.Q,['A','C','G','T'])","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Opt LL:-3783.226756522292\n5×5 Matrix{Any}:\n \"\" 'A' 'C' 'G' 'T'\n 'A' -1.02672 0.246386 0.494024 0.286309\n 'C' 0.314289 -0.971998 0.23034 0.427368\n 'G' 0.587774 0.214842 -0.950007 0.147391\n 'T' 0.300663 0.35183 0.130093 -0.782586","category":"page"},{"location":"optimization/#Optimizing-the-tree-topology-and-branch-lengths","page":"Optimization","title":"Optimizing the tree topology and branch lengths","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"With a tree and a model, we can also optimize the branch lengths and search, by nearest neighbour interchange for changes to the tree that improve the likelihood. Individually, these are performed by nni_optim! and branchlength_optim!, which need to have felsenstein! and felsenstein_down! called beforehand, but this is all bundled into:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"tree_polish!(tree, optimized_model)","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"LL: -3783.226756522292\nLL: -3782.345818028071\nLL: -3782.3231632207567\nLL: -3782.3211724011044\nLL: -3782.321068684831\nLL: -3782.3210622627776","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"And just to convince you this works, we can perturb the branch lengths, and see how the likelihood improves:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"for n in getnodelist(tree)\n n.branchlength *= (rand()+0.5)\nend\ntree_polish!(tree, optimzed_model)","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"LL: -3805.4140940138795\nLL: -3782.884883999107\nLL: -3782.351780962518\nLL: -3782.322906364547\nLL: -3782.321183009534\nLL: -3782.3210398963506\nLL: -3782.3210271696703","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"warning: Warning\ntree_polish! probably won't find a good tree from a completely start. Different tree search heuristics are required for that.","category":"page"},{"location":"optimization/#Functions","page":"Optimization","title":"Functions","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"reversibleQ\nunc2probvec\nbranchlength_optim!\nnni_optim!\ntree_polish!","category":"page"},{"location":"optimization/#MolecularEvolution.reversibleQ","page":"Optimization","title":"MolecularEvolution.reversibleQ","text":"reversibleQ(param_vec,eq_freqs)\n\nTakes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.unc2probvec","page":"Optimization","title":"MolecularEvolution.unc2probvec","text":"unc2probvec(v)\n\nTakes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.branchlength_optim!","page":"Optimization","title":"MolecularEvolution.branchlength_optim!","text":"branchlength_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5, bl_optimizer::UnivariateOpt = GoldenSectionOpt())\n\nUses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models). tol is the absolute tolerance for the bloptimizer which defaults to golden section search, and has Brent's method as an option by setting bl_optimizer=BrentsMethodOpt().\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.nni_optim!","page":"Optimization","title":"MolecularEvolution.nni_optim!","text":"nni_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5)\n\nConsiders local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models). accrule allows you to specify a function that takes the current and proposed log likelihoods, and if true is returned the move is accepted.\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.tree_polish!","page":"Optimization","title":"MolecularEvolution.tree_polish!","text":"tree_polish!(newt, models; tol = 10^-4, verbose = 1, topology = true)\n\nTakes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.\n\n\n\n\n\n","category":"function"},{"location":"simulation/#Simulation","page":"Simulation","title":"Simulation","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"The two key steps in phylogenetic simulation are 1) simulating the phylogeny itself, and 2) simulating data that evolves over the phylogeny.","category":"page"},{"location":"simulation/#Simulating-phylogenies","page":"Simulation","title":"Simulating phylogenies","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"warning: Warning\nWhile our sim_tree function seems to produce trees with the right shape, and is good enough for eg. generating varied tree shapes to evaluate different phylogeny inference schemes under, it is not yet sufficiently checked and tested for use where the details of the coalescent need to be absolutely accurate. It could, for example, be off by a constant factor somewhere. So if you plan on using this in a such a manner for a publication, please check the sim_tree code (and let us know).","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"If you just need a simple tree for testing things, then you can just use:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"tree = sim_tree(n=100)\ntree_draw(tree, draw_labels = false, canvas_height = 5cm)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"This has the characteristic \"coalescent under constant population size\" look.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"However, sim_tree is a bit more powerful than this: it aims to simulate branching under a coalescent process with flexible options for how the effective population size, as well as the sampling rate, might change over time. This is important, because the \"constant population size\" model is quite extreme, and most of the divergence happens in the early internal branches.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"A coalescent process runs backwards in time, starting from the most recent tip, and sampling backwards toward the root, coalescing nodes as it goes, and sometimes adding additional sampled tips. With sim_tree, if nstart = add_limit, then all the tips will be sampled at the same time, and the tree will be ultrametric.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"sim_tree has two arguments driving its flexibility. We'll start with sampling_rate, which controls the rate at which samples are added to the tree. Even under constant effective population size, this can produce interesting behavior.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"for sampling_rate in [5.0, 0.5, 0.05, 0.005]\n tree = sim_tree(100,1000.0,sampling_rate)\n display(tree_draw(tree, draw_labels = false, canvas_height = 5cm))\nend","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Above, this rate was just a fixed constant value, but we can also let this be a function. In this example, we'll plot the tree alongside the sampling rate function, as well as the cumulative number of samples through time.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"s(t) = ifelse(0 sum(x .> sample_times), xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"cumulative samples\", legend = :none))","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Note how the x axis of these plots is flipped, since the leaf furtherest from the root begins at time=0, and the coalescent runs backwards, from tip to root.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We can also vary the effective population size over time, which adds a different dimension of control. Here is an example showing the shape of a tree under exponential growth:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"n(t) = 100000*exp(-t/10)\ntree = sim_tree(100,n,100.0, nstart = 100)\ndisplay(tree_draw(tree, draw_labels = false, canvas_height = 7cm, canvas_width = 14cm))\n\nroot_dists,_ = MolecularEvolution.root2tip_distances(tree)\nplot(0.0:0.1:maximum(root_dists),n, xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"effective population size\", legend = :none)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Logistic growth, with a relatively low sampling rate, provides a reasonable model of an emerging virus that was only sampled later in its growth trajectory, such as HIV.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"n(t) = 10000/(1+exp(t-10))\ntree = sim_tree(100,n,20.0)\ndisplay(tree_draw(tree, draw_labels = false, canvas_height = 7cm, canvas_width = 14cm))\n\nroot_dists,_ = MolecularEvolution.root2tip_distances(tree)\ndisplay(plot(0.0:0.1:maximum(root_dists),n, xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"effective population size\", legend = :none))\n\nmrd = maximum(root_dists)\nsample_times = mrd .- root_dists\nplot(0.0:0.1:mrd,x -> sum(x .> sample_times), xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"cumulative samples\", legend = :none)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"How about a virus with a seasonally varying effective population size, where sampling is proportional to case counts? Between seasons, the effective population size gets so low that the next seasons clade arises from a one or two lineages in the previous season.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"n(t) = exp(sin(t/10) * 2.0 + 4)\ns(t) = n(t)/100\ntree = sim_tree(500,n,s)\ndisplay(tree_draw(tree, draw_labels = false))\n\n\nroot_dists,_ = MolecularEvolution.root2tip_distances(tree)\ndisplay(plot(0.0:0.1:maximum(root_dists),n, xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"effective population size\", legend = :none))\n\nmrd = maximum(root_dists)\nsample_times = mrd .- root_dists\nplot(0.0:0.1:mrd,x -> sum(x .> sample_times), xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"cumulative samples\", legend = :none)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Finally, the mutation_rate argument multiplicatively scales the branch lengths.","category":"page"},{"location":"simulation/#Simulating-evolution-over-phylogenies","page":"Simulation","title":"Simulating evolution over phylogenies","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We'll begin by simulating a tree, like the last example:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"using MolecularEvolution, FASTX, Phylo, Plots, CSV, DataFrames\n\nn(t) = exp(sin(t/10) * 2.0 + 4)\ns(t) = n(t)/100\ntree = sim_tree(500,n,s, mutation_rate = 0.005)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"If we need to open this tree in an external program, we can extract the Newick string representing this tree, and write it to a file:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"newick_string = newick(tree)\nopen(\"flu_sim.tre\",\"w\") do io\n println(io,newick_string)\nend","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Then we can set up a model. In this case, it'll be a combination of a nucleotide model of sequence evolution and Brownian motion over a continuous character.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"nuc_freqs = [0.2,0.3,0.3,0.2]\nnuc_rates = [1.0,2.0,1.0,1.0,1.6,0.5]\nnuc_model = DiagonalizedCTMC(reversibleQ(nuc_rates,nuc_freqs))\nbm_model = BrownianMotion(0.0,1.0)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"As usual, we set up the Partition structure, and load this onto our tree:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"message_template = [NucleotidePartition(nuc_freqs,300),GaussianPartition()]\ninternal_message_init!(tree, message_template)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Then we sample data under our model:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"sample_down!(tree, [nuc_model,bm_model])","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We'll can visualize the Brownian component of the simulation by loading it into the node_dict, and converting to a Phylo.jl tree.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"for n in getnodelist(tree)\n n.node_data = Dict([\"mu\"=>n.message[2].mean])\nend\nphylo_tree = get_phylo_tree(tree)\nplot(phylo_tree, showtips = false, line_z = \"mu\", colorbar = :none,\n linecolor = :darkrainbow, linewidth = 1.0, size = (600, 600))","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We can write the simulated data, including sequences and continuous characters, to a CSV:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"df = DataFrame()\ndf.names = [n.name for n in getleaflist(tree)]\ndf.seqs = [partition2obs(n.message[1]) for n in getleaflist(tree)]\ndf.mu = [partition2obs(n.message[2]) for n in getleaflist(tree)]\nCSV.write(\"flu_sim_seq_and_bm.csv\",df)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Or we could export just the sequences as .fasta","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"write_fasta(\"flu_sim_seq_and_bm.fasta\",df.seqs,seq_names = df.names)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Which will look something like this, when opened in AliView","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/#Functions","page":"Simulation","title":"Functions","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"sim_tree\nsample_down!\npartition2obs","category":"page"},{"location":"simulation/#MolecularEvolution.sim_tree","page":"Simulation","title":"MolecularEvolution.sim_tree","text":"sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)\n\nSimulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.\n\nNefunc(t) = (sin(t/10)+1)*100.0 + 10.0 root = simtree(600,Nefunc,1.0) simpletree_draw(ladderize(root))\n\n\n\n\n\nsim_tree(;n = 10)\n\nSimulates tree with constant population size.\n\n\n\n\n\n","category":"function"},{"location":"simulation/#MolecularEvolution.sample_down!","page":"Simulation","title":"MolecularEvolution.sample_down!","text":"sampledown!(root::FelNode,models,partitionlist)\n\nGenerates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"function"},{"location":"simulation/#MolecularEvolution.partition2obs","page":"Simulation","title":"MolecularEvolution.partition2obs","text":"partition2obs(part::Partition)\n\nExtracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.\n\n\n\n\n\n","category":"function"},{"location":"framework/#The-MolecularEvolution.jl-Framework","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"The organizing principle is that the core algorithms, including Felsenstein's algorithm, but also a related family of message passing algorithms and inference machinery, are implemented in a way that does not refer to any specific model or even to any particular data type.","category":"page"},{"location":"framework/#Partitions-and-BranchModels","page":"The MolecularEvolution.jl Framework","title":"Partitions and BranchModels","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"A Partition is a probabilistic representation of some kind of state. Specifically, it needs to be able to represent P(obs|state) and P(obs,state) when considered as functions of state. So it will typically be able to assign a probability to any possible value of state, and is unnormalized - not required to sum or integrate to 1 over all values of state. As an example, for a discrete state with 4 categories, this could just be a vector of 4 numbers.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"For a Partition type to be usable by MolecularEvolution.jl, the combine! function needs to be implemented. If you have P(obsA|state) and P(obsB|state), then combine! calculates P(obsA,obsB|state) under the assumption that obsA and obsB are conditionally independent given state. MolecularEvolution.jl tries to avoid allocating memory, so combine!(dest,src) places in dest the combined Partition in dest. For a discrete state with 4 categories, this is simply element-wise multiplication of two state vectors.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"A BranchModel defines how Partition distributions evolve along branches. Two functions need to be implemented: backward! and forward!. We imagine our trees with the root at the top, and forward! moves from root to tip, and backward! moves from tip to root. backward!(dest::P,src::P,m::BranchModel,n::FelNode) takes a src Partition, representing P(obs-below|state-at-bottom-of-branch), and modifies the dest Partition to be P(obs-below|state-at-top-of-branch), where the branch in question is the branch above the FelNode n. forward! goes in the opposite direction, from P(obs-above,state-at-top-of-branch) to P(obs-above,state-at-bottom-of-branch), with the Partitions now, confusingly, representing joint distributions.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"(Image: )","category":"page"},{"location":"framework/#Messages","page":"The MolecularEvolution.jl Framework","title":"Messages","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"Nodes on our trees work with messages, where a message is a vector of Partition structs. This is in case you wish to model multiple different data types on the same tree. Often, all the messages on the tree will just be arrays containing a single Partition, but if you're accessing them you need to remember that they're in an array!","category":"page"},{"location":"framework/#Trees","page":"The MolecularEvolution.jl Framework","title":"Trees","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"Each node in our tree is a FelNode (\"Fel\" for \"Felsenstein\"). They point to their parent nodes, and an array of their children, and they store their main vector of Partitions, but also cached versions of those from their parents and children, to allow certain message passing schemes. They also have a branchlength field, which tells eg. forward! and backward! how much evolution occurs along the branch above (ie. closer to the root) that node. They also allow for an arbitrary dictionary of node_data, in case a model needs any other branch-specific parameters.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"The set of algorithms needs to know which model to use for which partition, so the assumption made is that they'll see an array of models whose order will match the partition array. In general, we might want the models to vary from one branch to another, so the central algorithms take a function that associates a FelNode->Vector{:Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"function"},{"location":"ancestors/#MolecularEvolution.cascading_max_state_dict","page":"Ancestral Reconstruction","title":"MolecularEvolution.cascading_max_state_dict","text":"cascading_max_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"function"},{"location":"ancestors/#MolecularEvolution.endpoint_conditioned_sample_state_dict","page":"Ancestral Reconstruction","title":"MolecularEvolution.endpoint_conditioned_sample_state_dict","text":"endpoint_conditioned_sample_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"function"},{"location":"examples/#Examples","page":"Examples","title":"Examples","text":"","category":"section"},{"location":"examples/#Example-1:-Amino-acid-ancestral-reconstruction-and-visualization","page":"Examples","title":"Example 1: Amino acid ancestral reconstruction and visualization","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"This example reads amino acid sequences from this FASTA file, and a phylogeny from this Newick tree file. A WAG amino acid model, augmented to explicitly model gap (ie. '-') characters, and a global substitution rate is estimated by maximum likelihood. Under this optimized model, the distribution over ancestral amino acids is constructed for each node, and visualized in multiple ways.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using MolecularEvolution, FASTX, Phylo, Plots\n\n#Read in seqs and tree\nseqnames, seqs = read_fasta(\"Data/MusAA_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusAA_IGHV.tre\")\n\n#Compute AA freqs, which become the equilibrium freqs of the model, and the initial root freqs\nAA_freqs = char_proportions(seqs,MolecularEvolution.gappyAAstring)\n#Build the Q matrix\nQ = gappy_Q_from_symmetric_rate_matrix(WAGmatrix,1.0,AA_freqs)\n#Build the model\nm = DiagonalizedCTMC(Q)\n#Set up the memory on the tree\ninitial_partition = GappyAminoAcidPartition(AA_freqs,length(seqs[1]))\npopulate_tree!(tree,initial_partition,seqnames,seqs)\n\n#Set up a likelihood function to find the scaling constant that best fits the branch lengths of the imported tree\n#Note, calling LL will change the rate, so make sure you set it to what you want after this has been called\nll = function(rate; m = m)\n m.r = rate\n return log_likelihood!(tree,m)\nend\nopt_rate = golden_section_maximize(ll, 0.0, 10.0, identity, 1e-11);\nplot(opt_rate*0.87:0.001:opt_rate*1.15,ll,size = (500,250),\n xlabel = \"rate\",ylabel = \"log likelihood\", legend = :none)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Then set the model parameters to the maximum likelihood estimate, and reconstruct the ancestral states.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"m.r = opt_rate\n#Reconstructing the marginal distributions of amino acids at internal nodes\nd = marginal_state_dict(tree,m)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"That's it! Everything else is for visualizing these ancestral states. We'll select a set of amino acid positions to visualize, corresponding to these two (red arrows) alignment columns:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#The alignment indices we want to pay attention to in our reconstructions\nmotif_inds = [52,53]\n\n#We'll compute a confidence score for the inferred marginal state\nconfidence(state,inds) = minimum([maximum(state[:,i]) for i in inds])\n\n#Map motifs to numbers, so we can work with more convenient continuous color scales\nall_motifs = sort(union([partition2obs(d[n][1])[motif_inds] for n in getnodelist(tree)]))\nmotif2num = Dict(zip(all_motifs,1:length(all_motifs)))\n\n#Populating the node_data dictionary to help with plotting\nfor n in getnodelist(tree)\n moti = partition2obs(d[n][1])[motif_inds]\n n.node_data = Dict([\n \"motif\"=>moti,\n \"motif_color\"=>motif2num[moti],\n \"uncertainty\"=>1-confidence(d[n][1].state,motif_inds)\n ])\nend\n\n#Transducing the MolecularEvolution FelNode tree to a Phylo.jl tree, which migrates node_data as well\nphylo_tree = get_phylo_tree(tree)\nnode_unc = values_from_phylo_tree(phylo_tree,\"uncertainty\")\n\nprintln(\"Greatest motif uncertainty: \",maximum([n.node_data[\"uncertainty\"] for n in getnodelist(tree)]))","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Greatest motif uncertainty: 0.6104376723068156","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Plotting, using discrete marker colors\npl = plot(phylo_tree,\n showtips = true, tipfont = 6, marker_group = \"motif\", palette = :seaborn_bright,\n markeralpha = 0.75, markerstrokewidth = 0, margins = 2Plots.cm, legend = :topleft,\n linewidth = 1.5, size = (400, 800))\n\nsavefig_tweakSVG(\"anc_tree_with_legend.svg\", pl)\npl","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Plotting, using discrete marker colors\npl = plot(phylo_tree, treetype = :fan,\n showtips = true, tipfont = 6, marker_group = \"motif\", palette = :seaborn_bright,\n markeralpha = 0.75, markerstrokewidth = 0, margins = 2Plots.cm, legend = :topleft,\n linewidth = 1.5, size = (800, 800))\n\nsavefig_tweakSVG(\"anc_circ_tree_with_legend.svg\", pl)\npl","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Plotting using continuous color scales, and using marker size to show uncertainty in reconstructions\ncolor_scale = :rainbow\npl = plot(phylo_tree, showtips = true, tipfont = 6, marker_z = \"motif_color\", line_z = \"motif_color\",\n markersize = 10 .* sqrt.(node_unc), linecolor = color_scale, markercolor = color_scale, markeralpha = 0.75,\n markerstrokewidth = 0,margins = 2Plots.cm, colorbar = :none, linewidth = 2.5, size = (400, 800))\n\n#Feeble attempt at a manual legend\nmotif_ys = collect(1:length(all_motifs)) .+ (length(seqs) - length(all_motifs))\nscatter!(zeros(length(all_motifs)) , motif_ys , marker = 8, markeralpha = 0.75,\n marker_z = 1:length(all_motifs), markercolor = color_scale, markerstrokewidth = 0.0)\nfor i in 1:length(all_motifs)\n annotate!(0.1, motif_ys[i], all_motifs[i],7)\nend\n\nsavefig_tweakSVG(\"anc_tree_continuous.svg\", pl)\npl","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/#Example-2:-GTRGamma","page":"Examples","title":"Example 2: GTR+Gamma","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"For site-to-site \"random effects\" rate variation, such as under the GTR+Gamma model, we need to use a \"Site-Wise Mixture\" model, or SWMModel with its SWMPartition.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Set up a function that will return a set of rates that will, when equally weighted, VERY coarsely approx a Gamma distribution\nfunction equiprobable_gamma_grids(s,k)\n grids = quantile(Gamma(s,1/s),1/2k:1/k:(1-1/2k))\n grids ./ mean(grids)\nend\n\n#Read in seqs and tree, and populate the three NucleotidePartitions\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\n\n#Set up the Partition that will be replicated in the SWMModel\ninitial_partition = NucleotidePartition(length(seqs[1]))\n\n#To be able to use unconstrained optimization, we use `ParameterHandling.jl`\ninitial_params = (\n rates=positive(ones(6)),\n gam_shape=positive(1.0),\n pi=zeros(3)\n)\nflat_initial_params, unflatten = value_flatten(initial_params)\nnum_params = length(flat_initial_params)\n\n#Setting up the Site-Wise Mixture Partition:\n#Note: this constructor sets the weights of all categories to 1/rate_cats\n#That is fine for our equi-probable category model, but this will need to be different for other models.\nrate_cats = 5\nREL_partition = MolecularEvolution.SWMPartition{NucleotidePartition}(initial_partition,rate_cats)\npopulate_tree!(tree,REL_partition,seqnames,seqs)\n\nfunction build_model_vec(params; cats = rate_cats)\n r_vals = equiprobable_gamma_grids(params.gam_shape,cats)\n pi = unc2probvec(params.pi)\n return MolecularEvolution.SWMModel(DiagonalizedCTMC(reversibleQ(params.rates,pi)),r_vals)\nend\n\nfunction objective(params::NamedTuple; tree = tree)\n v = unc2probvec(params.pi)\n #Root freqs need to be set over all component partitions\n for p in tree.parent_message[1].parts\n p.state .= v\n end\n return -log_likelihood!(tree,build_model_vec(params))\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\n\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\nscore,mini,did_it_work = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\noptimized_model = build_model_vec(final_params)\nLL = log_likelihood!(tree,optimized_model)\nprintln(did_it_work)\nprintln(\"Opt LL:\",LL)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"SUCCESS\nOpt LL:-3728.4761606135307","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Other functions also work with these kinds of random-effects site-wise mixture models:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"tree_polish!(tree,optimized_model)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"LL: -3728.4761606135307\nLL: -3728.1316616075173\nLL: -3728.121005993758\nLL: -3728.1202243978914\nLL: -3728.1201348447107","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Sometimes we might want the rate values for each category to stay fixed, but optimize their weights:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Using rate categories with fixed values\nfixed_cats = [0.00001,0.33,1.0,3.0,9.0]\n\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\n\ninitial_partition = NucleotidePartition(length(seqs[1]))\n\ninitial_params = (\n rates=positive(ones(6)),\n cat_weights=zeros(length(fixed_cats)-1), #Category weights\n pi=zeros(3) #Nuc freqs\n)\nflat_initial_params, unflatten = value_flatten(initial_params)\nnum_params = length(flat_initial_params)\n\nREL_partition = MolecularEvolution.SWMPartition{NucleotidePartition}(initial_partition,length(fixed_cats))\npopulate_tree!(tree,REL_partition,seqnames,seqs)\n\nfunction build_model_vec(params; cats = fixed_cats)\n cat_weights = unc2probvec(params.cat_weights)\n pi = unc2probvec(params.pi)\n m = MolecularEvolution.SWMModel(DiagonalizedCTMC(reversibleQ(params.rates,pi)),cats)\n m.weights .= cat_weights\n return m\nend\n\nfunction objective(params::NamedTuple; tree = tree)\n v = unc2probvec(params.pi)\n for p in tree.parent_message[1].parts\n p.state .= v\n end\n return -log_likelihood!(tree,build_model_vec(params))\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\n\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\nscore,mini,did_it_work = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\noptimized_model = build_model_vec(final_params)\nLL = log_likelihood!(tree,optimized_model)\n\nprintln(did_it_work)\nprintln(\"Opt LL:\",LL)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"SUCCESS\nOpt LL:-3719.6290948420706","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"When you have a Site-Wise Mixture (ie. REL) model, the category weights can be handled \"outside\" of the main likelihood calculations. This means that they can be optimized very quickly, within an objective function that is optimizing over the other parameters. The following example uses an EM approach to do this:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using Distributions, FASTX, ParameterHandling, NLopt\n\n#Using rate categories with fixed values\nfixed_cats = [(i/5)^2 for i in 1:12]\n\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\n\ninitial_partition = NucleotidePartition(length(seqs[1]))\n\ninitial_params = (\n rates=positive(ones(6)),\n pi=zeros(3) #Nuc freqs\n)\nflat_initial_params, unflatten = value_flatten(initial_params)\nnum_params = length(flat_initial_params)\n\nREL_partition = MolecularEvolution.SWMPartition{NucleotidePartition}(initial_partition,length(fixed_cats))\npopulate_tree!(tree,REL_partition,seqnames,seqs)\n\nfunction build_model_vec(params; cats = fixed_cats)\n pi = unc2probvec(params.pi)\n m = SWMModel(DiagonalizedCTMC(reversibleQ(params.rates,pi)),cats)\n return m\nend\n\n#LL for a mixture when the grid of probabilities is pre-computed\ngrid_ll(v,g) = sum(log.(sum((v./sum(v)) .* g,dims = 1)))\n\n#Note: we can get away with relatively few EM iterations within the optimization cycle (in this example at least)\nfunction opt_weights_and_LL(temp_part::SWMPartition{PType}; iters = 25) where {PType <: MolecularEvolution.MultiSitePartition} \n g,scals = SWM_prob_grid(temp_part) \n l = size(g)[1]\n #We can optimize the category weights without re-computing felsenstein\n #So it can make sense to do so within the optimization function\n #Which means you don't need to optimize over as many parameters\n θ = weightEM(g,ones(l)./l, iters = iters)\n LL_optimizing_over_weights = grid_ll(θ,g) + sum(scals)\n return θ,LL_optimizing_over_weights\nend\n\nfunction objective(params::NamedTuple; tree = tree)\n v = unc2probvec(params.pi)\n for p in tree.parent_message[1].parts\n p.state .= v\n end\n felsenstein!(tree,build_model_vec(params))\n #Optim inside optim\n #We first need to handle the merge of the parent and root partitions - usually handled for us magically!\n #Be careful: this example is hard-coded for a single partition\n temp_part = copy_partition(tree.parent_message[1])\n combine!(temp_part, tree.message[1])\n θ,LL = opt_weights_and_LL(temp_part)\n return -LL\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\n\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\n@time score,mini,did_it_work = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\noptimized_model = build_model_vec(final_params)\n\nfelsenstein!(tree,optimized_model)\ntemp_part = copy_partition(tree.parent_message[1])\ncombine!(temp_part, tree.message[1])\nθ,_ = opt_weights_and_LL(temp_part, iters = 1000) #polish weights for final pass - quick\noptimized_model.weights .= θ\nLL = log_likelihood!(tree,optimized_model)\n\nprintln(did_it_work, \":\", score)\nprintln(\"Opt LL:\",LL)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"3.932150 seconds (2.38 M allocations: 2.378 GiB, 10.78% gc time, 3.28% compilation time: 7% of which was recompilation)\nSUCCESS:3720.1347720900067\nOpt LL:-3719.4808937732614","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"This can be dramatically faster than trying to directly optimize over category weights when the number of categories grows. The above example took 140s with the direct approach.","category":"page"},{"location":"examples/#Example-3:-FUBAR","page":"Examples","title":"Example 3: FUBAR","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"This example reads codon sequences from this FASTA file, and a phylogeny from this Newick tree file, and implements FUBAR.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using MolecularEvolution, FASTX, ParameterHandling, NLopt, Plots\n\n#Read in seqs and tree\nseqnames, seqs = read_fasta(\"Data/Flu.fasta\")\ntree = read_newick_tree(\"Data/Flu.tre\")\n\n#Count F3x4 frequencies from the seqs, and estimate codon freqs from this\nf3x4 = MolecularEvolution.count_F3x4(seqs);\neq_freqs = MolecularEvolution.F3x4_eq_freqs(f3x4);\n\n#Set up a codon partition (will default to Universal genetic code)\ninitial_partition = CodonPartition(Int64(length(seqs[1])/3))\ninitial_partition.state .= eq_freqs\npopulate_tree!(tree,initial_partition,seqnames,seqs)\n\n#We'll use the empirical F3x4 freqs, fixed MG94 alpha=1, and optimize the nuc parameters and MG94 beta\n#Note: the nuc rates are confounded with alpha\ninitial_params = (\n rates=positive(ones(6)), #rates must be non-negative\n beta = positive(1.0)\n)\nflat_initial_params, unflatten = value_flatten(initial_params) #See ParameterHandling.jl docs\nnum_params = length(flat_initial_params)\n\nfunction build_model_vec(p; F3x4 = f3x4, alpha = 1.0)\n #If you run into numerical issues with DiagonalizedCTMC, switch to GeneralCTMC instead\n return DiagonalizedCTMC(MolecularEvolution.MG94_F3x4(alpha, p.beta, reversibleQ(p.rates,ones(4)), F3x4))\nend\n\nfunction objective(params::NamedTuple; tree = tree, eq_freqs = eq_freqs)\n return -log_likelihood!(tree,build_model_vec(params))\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\n@time _,mini,_ = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\nnucmat = reversibleQ(final_params.rates,ones(4))","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":" 10.596546 seconds (840.87 k allocations: 5.221 GiB, 7.45% gc time, 0.35% compilation time: 25% of which was recompilation)\n4×4 Matrix{Float64}:\n -9.41346 1.77048 6.85997 0.783008\n 1.77048 -7.24162 0.280525 5.19061\n 6.85997 0.280525 -8.651 1.5105\n 0.783008 5.19061 1.5105 -7.48412","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"The scaling of that nuc matrix reflects the fact that the we're using a tree that was estimated under a nuc model, but here we're optimizing a codon model. No issue: the nuc rates have absorbed this scaling difference.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Now we set up a 20-by-20 grid, slicing the MG94 α and β parameters at the following values:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"grid_values = 10 .^ (-1.35:0.152:1.6) .- 0.0423174293933042","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"20-element Vector{Float64}:\n 0.0023509298217921012\n 0.021069541732388508\n 0.047632328759699305\n 0.08532645148783018\n 0.13881657986865603\n 0.2147221488835822\n 0.3224365175323036\n 0.4752894025572635\n 0.6921964387638108\n 1.0\n 1.4367909587749033\n 2.05662245423022\n 2.9361990000358853\n 4.184368713262725\n 5.95559333316179\n 8.469062952630463\n 12.0358209216745\n 17.09725564569095\n 24.27972266134484\n 34.47205650419232","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Then we calculate the conditional likelihoods for each site. Note the 20-by-20 grid is stretched out into a length 400 vector to keep things simple. I'm avoiding reshape tricks to keep the grid structure clear.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"LL_matrix = zeros(length(grid_values)^2,initial_partition.sites);\nalpha_vec = zeros(length(grid_values)^2);\nalpha_ind_vec = zeros(Int64,length(grid_values)^2);\nbeta_vec = zeros(length(grid_values)^2);\nbeta_ind_vec = zeros(Int64,length(grid_values)^2);\n\ni = 1\n@time for (a,alpha) in enumerate(grid_values)\n for (b,beta) in enumerate(grid_values)\n alpha_vec[i],beta_vec[i] = alpha, beta\n alpha_ind_vec[i], beta_ind_vec[i] = a,b\n m = DiagonalizedCTMC(MolecularEvolution.MG94_F3x4(alpha, beta, nucmat, f3x4))\n felsenstein!(tree,m)\n #This is because we need to include the eq freqs in the site LLs:\n combine!(tree.message[1],tree.parent_message[1])\n LL_matrix[i,:] .= MolecularEvolution.site_LLs(tree.message[1])\n i += 1\n end\nend\nprob_matrix = exp.(LL_matrix .- maximum(LL_matrix,dims = 1))\nprob_matrix ./= sum(prob_matrix,dims = 1);","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Then we use an EM-like MAP algorithm to find the posterior grid weights, and visualize this surface:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"LDAθ = weightEM(prob_matrix, ones(length(alpha_vec))./length(alpha_vec), conc = 0.4, iters = 5000);\n\n#A function to viz the grid surface\nfunction gridplot(alpha_ind_vec,beta_ind_vec,grid_values,θ; title = \"\")\n scatter(alpha_ind_vec,beta_ind_vec, zcolor = θ, c = :darktest,\n markersize = sqrt(length(alpha_ind_vec))/2, markershape=:square, markerstrokewidth=0.0, size=(550,500),\n label = :none, xticks = (1:length(grid_values), round.(grid_values,digits = 3)), xrotation = 90,\n yticks = (1:length(grid_values), round.(grid_values,digits = 3)), margin=6Plots.mm,\n xlabel = \"α\", ylabel = \"β\", title = title)\n plot!(1:length(grid_values),1:length(grid_values),color = \"grey\", style = :dash, label = :none)\nend\n\ngridplot(alpha_ind_vec,beta_ind_vec,grid_values,LDAθ)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"We can see that the posterior distribution over sites is heavily concentrated at β<α. But are there any sites where β>α?","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"weighted_mat = prob_matrix .* LDAθ\nfor site in 1:size(prob_matrix)[2]\n pos = sum(weighted_mat[beta_vec .> alpha_vec,site])/sum(weighted_mat[:,site])\n if pos > 0.9\n println(\"Site $(site): P(β>α)=$(round(pos,digits = 4))\")\n end\nend","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Site 153: P(β>α)=0.9074\nSite 158: P(β>α)=0.9266\nSite 160: P(β>α)=0.9547","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"And let's visualize one of those sites:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"gridplot(alpha_ind_vec,beta_ind_vec,grid_values, weighted_mat[:,160]./sum(weighted_mat[:,160]))","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"viz/#Visualization","page":"Visualization","title":"Visualization","text":"","category":"section"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"We offer two routes to visualization. The first is using our own plotting routines, built atop Compose.jl. The second converts our trees to Phylo.jl trees, and plots with their Plots.jl recipes. The Compose, Plots, and Phylo dependencies are optional.","category":"page"},{"location":"viz/#Example-1","page":"Visualization","title":"Example 1","text":"","category":"section"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"using MolecularEvolution, Plots, Phylo\n\n#First simulate a tree, and then Brownian motion:\ntree = sim_tree(n=20)\ninternal_message_init!(tree, GaussianPartition())\nbm_model = BrownianMotion(0.0,0.1)\nsample_down!(tree, bm_model)\n\n#We'll add the Gaussian means to the node_data dictionaries\nfor n in getnodelist(tree)\n n.node_data = Dict([\"mu\"=>n.message[1].mean])\nend\n\n#Transducing the mol ev tree to a Phylo.jl tree\nphylo_tree = get_phylo_tree(tree)\n\npl = plot(phylo_tree,\n showtips = true, tipfont = 6, marker_z = \"mu\", markeralpha = 0.5, line_z = \"mu\", linecolor = :darkrainbow, \n markersize = 4.0, markerstrokewidth = 0,margins = 1Plots.cm,\n linewidth = 1.5, markercolor = :darkrainbow, size = (500, 500))","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"(Image: )","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"We also offer savefig_tweakSVG(\"simple_plot_example.svg\", pl) for some post-processing tricks that improve the exported trees, like rounding line caps, and values_from_phylo_tree(phylo_tree,\"mu\") which can extract stored quantities in the right order for passing into eg. markersize options when plotting.","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"For a more comprehensive list of things you can do with Phylo.jl plots, please see their documentation.","category":"page"},{"location":"viz/#Drawing-trees-with-Compose.jl.","page":"Visualization","title":"Drawing trees with Compose.jl.","text":"","category":"section"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"The Compose.jl in-house tree drawing offers extensive flexibility. Here is an example that plots a pie chart representing the marginal probability of each of the 4 possible nucleotides on all nodes on the tree:","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"using MolecularEvolution, Compose\n\ntree = sim_tree(40,1000.0,0.005,mutation_rate = 0.001)\nmodel = DiagonalizedCTMC(reversibleQ(ones(6),ones(4)./4))\ninternal_message_init!(tree, NucleotidePartition(ones(4)./4,1))\nsample_down!(tree,model)\nd = marginal_state_dict(tree,model);\n\ncompose_dict = Dict()\nfor n in getnodelist(tree)\n compose_dict[n] = (x,y)->pie_chart(x,y,d[n][1].state[:,1],size = 0.02, opacity = 0.75)\nend\nimg = tree_draw(tree,draw_labels = false, line_width = 0.5mm, compose_dict = compose_dict)","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"(Image: )","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"This can then be exported with:","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"savefig_tweakSVG(\"piechart_tree.svg\",img)","category":"page"},{"location":"viz/#Functions","page":"Visualization","title":"Functions","text":"","category":"section"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"get_phylo_tree\nvalues_from_phylo_tree\nsavefig_tweakSVG\ntree_draw","category":"page"},{"location":"viz/#MolecularEvolution.get_phylo_tree","page":"Visualization","title":"MolecularEvolution.get_phylo_tree","text":"get_phylo_tree(molev_root::FelNode; data_function = (x -> Tuple{String,Float64}[]))\n\nConverts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.\n\n\n\n\n\n","category":"function"},{"location":"viz/#MolecularEvolution.values_from_phylo_tree","page":"Visualization","title":"MolecularEvolution.values_from_phylo_tree","text":"values_from_phylo_tree(phylo_tree, key)\n\nReturns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.\n\n\n\n\n\n","category":"function"},{"location":"viz/#MolecularEvolution.savefig_tweakSVG","page":"Visualization","title":"MolecularEvolution.savefig_tweakSVG","text":"savefig_tweakSVG(fname, plot::Plots.Plot; hack_bounding_box = true, new_viewbox = nothing, linecap_round = true)\n\nNote: Might only work if you're using the GR backend!! Saves a figure created using the Phylo Plots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.\n\neg. savefig_tweakSVG(\"export.svg\",pl, new_viewbox = [-100, -100, 3000, 4500])\n\n\n\n\n\nsavefig_tweakSVG(fname, plot::Context; width = 10cm, height = 10cm, linecap_round = true, white_background = true)\n\nSaves a figure created using the Compose approach, but tweaks the SVG after export.\n\neg. savefig_tweakSVG(\"export.svg\",pl)\n\n\n\n\n\n","category":"function"},{"location":"viz/#MolecularEvolution.tree_draw","page":"Visualization","title":"MolecularEvolution.tree_draw","text":"tree_draw(tree::FelNode;\n canvas_width = 15cm, canvas_height = 15cm,\n stretch_for_labels = 2.0, draw_labels = true,\n line_width = 0.1mm, font_size = 4pt,\n min_dot_size = 0.00, max_dot_size = 0.01,\n line_opacity = 1.0,\n dot_opacity = 1.0,\n name_opacity = 1.0,\n horizontal = true,\n dot_size_dict = Dict(), dot_size_default = 0.0,\n dot_color_dict = Dict(), dot_color_default = \"black\",\n line_color_dict = Dict(), line_color_default = \"black\",\n label_color_dict = Dict(), label_color_default = \"black\",\n nodelabel_dict = Dict(),compose_dict = Dict()\n )\n\nDraws a tree with a number of self-explanatory options. Dictionaries that map a node to a color/size are used to control per-node plotting options. compose_dict must be a FelNode->function(x,y) dictionary that returns a compose() struct.\n\nExample using compose_dict\n\nstr_tree = \"(((((tax24:0.09731668728575642,(tax22:0.08792233964843627,tax18:0.9210388482867483):0.3200367900275155):0.6948314526087965,(tax13:1.9977212308725611,(tax15:0.4290074347886068,(tax17:0.32928401808187824,(tax12:0.3860215462534818,tax16:0.2197134841232339):0.1399122681886174):0.05744611946245004):1.4686085778061146):0.20724159879522402):0.4539334554156126,tax28:0.4885576926440158):0.002162260013924424,tax26:0.9451873777301325):3.8695419798779387,((tax29:0.10062813251515536,tax27:0.27653633028085006):0.04262434258357507,(tax25:0.009345653929737636,((tax23:0.015832941547076644,(tax20:0.5550597590956172,((tax8:0.6649025646927402,tax9:0.358506423199849):0.1439516404012261,tax11:0.01995439013213013):1.155181296134081):0.17930021667907567):0.10906638146207207,((((((tax6:0.013708993438720255,tax5:0.061144001556547097):0.1395453591567641,tax3:0.4713722705245479):0.07432598428904214,tax1:0.5993347898257291):1.0588025698844894,(tax10:0.13109032492533992,(tax4:0.8517302241963356,(tax2:0.8481963081549965,tax7:0.23754095940676642):0.2394313086297733):0.43596704123297675):0.08774657269409454):0.9345533723114966,(tax14:0.7089558245245173,tax19:0.444897137240675):0.08657675809803095):0.01632062723968511,tax21:0.029535281963725537):0.49502691718938285):0.25829576024240986):0.7339777396780424):4.148878039524972):0.0\"\nnewt = gettreefromnewick(str_tree, FelNode)\nladderize!(newt)\ncompose_dict = Dict()\nfor n in getleaflist(newt)\n #Replace the rand(4) with the frequencies you actually want.\n compose_dict[n] = (x,y)->pie_chart(x,y,MolecularEvolution.sum2one(rand(4)),size = 0.03)\nend\ntree_draw(newt,draw_labels = false,line_width = 0.5mm, compose_dict = compose_dict)\n\n\nimg = tree_draw(tree)\nimg |> SVG(\"imgout.svg\",10cm, 10cm)\nOR\nusing Cairo\nimg |> PDF(\"imgout.pdf\",10cm, 10cm)\n\n\n\n\n\n","category":"function"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = MolecularEvolution","category":"page"},{"location":"#MolecularEvolution","page":"Home","title":"MolecularEvolution","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Documentation for MolecularEvolution.","category":"page"},{"location":"#A-Julia-package-for-the-flexible-development-of-phylogenetic-models.","page":"Home","title":"A Julia package for the flexible development of phylogenetic models.","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"MolecularEvolution.jl exploits Julia's multiple dispatch, implementing a fully generic suite of likelihood calculations, branchlength optimization, topology optimization, and ancestral inference. Users can construct trees using already-defined data types and models. But users can define probability distributions over their own data types, and specify the behavior of these under their own model types, and can mix and match different models on the same phylogeny.","category":"page"},{"location":"","page":"Home","title":"Home","text":"If the behavior you need is not already available in MolecularEvolution.jl:","category":"page"},{"location":"","page":"Home","title":"Home","text":"If you have a new data type:\nA Partition type that represents the uncertainty over your state. \ncombine!() that merges evidence from two Partitions.\nIf you have a new model:\nA BranchModel type that stores your model parameters.\nforward!() that evolves state distributions over branches, in the root-to-tip direction.\nbackward!() that reverse-evolves state distributions over branches, in the tip-to-root direction.","category":"page"},{"location":"","page":"Home","title":"Home","text":"And then sampling, likelihood calculations, branch-length optimization, ancestral reconstruction, etc should be available for your new data or model.","category":"page"},{"location":"#Design-principles","page":"Home","title":"Design principles","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"In order of importance, we aim for the following:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Flexibility and generality\nWhere possible, we avoid design decisions that limit the development of new models, or make it harder to develop new models.\nWe do not sacrifice flexibility for performance.\nScalability\nAnalyses implemented using MolecularEvolution.jl should scale to large, real-world datasets.\nPerformance\nWhile the above take precedence over speed, it should be possible to optimize your Partition, combine!(), BranchModel, forward!() and backward!() functions to obtain competative runtimes.","category":"page"},{"location":"#Authors:","page":"Home","title":"Authors:","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Venkatesh Kumar and Ben Murrell, with additional contributions by Sanjay Mohan, Alec Pankow, Hassan Sadiq, and Kenta Sato.","category":"page"},{"location":"#Quick-example:-Likelihood-calculations-under-phylogenetic-Brownian-motion:","page":"Home","title":"Quick example: Likelihood calculations under phylogenetic Brownian motion:","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"using MolecularEvolution, Plots\n\n#First simulate a tree, using a coalescent process\ntree = sim_tree(n=200)\ninternal_message_init!(tree, GaussianPartition())\n#Simulate brownian motion over the tree\nbm_model = BrownianMotion(0.0,1.0)\nsample_down!(tree, bm_model)\n#And plot the log likelihood as a function of the parameter value\nll(x) = log_likelihood!(tree,BrownianMotion(0.0,x))\nplot(0.7:0.001:1.6,ll, xlabel = \"variance per unit time\", ylabel = \"log likelihood\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"(Image: )","category":"page"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"","page":"Home","title":"Home","text":"Modules = [MolecularEvolution]","category":"page"},{"location":"#Base.:==-Union{Tuple{T}, Tuple{T, T}} where T<:AbstractTreeNode","page":"Home","title":"Base.:==","text":"==(t1, t2)\nDefaults to pointer equality\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.SWM_prob_grid-Union{Tuple{SWMPartition{PType}}, Tuple{PType}} where PType<:MultiSitePartition","page":"Home","title":"MolecularEvolution.SWM_prob_grid","text":"SWM_prob_grid(part::SWMPartition{PType}) where {PType <: MultiSitePartition}\n\nReturns a matrix of probabilities for each site, for each model (in the probability domain - not logged!) as well as the log probability offsets\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution._mapreduce-Union{Tuple{T}, Tuple{AbstractTreeNode, T, Any, Any}} where T<:Function","page":"Home","title":"MolecularEvolution._mapreduce","text":"Internal function. Helper for bfsmapreduce and dfsmapreduce\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.backward!-Tuple{DiscretePartition, DiscretePartition, GeneralCTMC, FelNode}","page":"Home","title":"MolecularEvolution.backward!","text":"backward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)\n\nPropagate the source partition backwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.bfs_mapreduce-Union{Tuple{T}, Tuple{AbstractTreeNode, T, Any}} where T<:Function","page":"Home","title":"MolecularEvolution.bfs_mapreduce","text":"Performs a BFS map-reduce over the tree, starting at a given node For each node, mapreduce is called as: mapreduce(currnode::FelNode, prevnode::FelNode, aggregator) where prev_node is the previous node visited on the path from the start node to the current node It is expected to update the aggregator, and not return anything.\n\nNot exactly conventional map-reduce, as map-reduce calls may rely on state in the aggregator added by map-reduce calls on other nodes visited earlier.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.branchlength_optim!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.branchlength_optim!","text":"branchlength_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5, bl_optimizer::UnivariateOpt = GoldenSectionOpt())\n\nUses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models). tol is the absolute tolerance for the bloptimizer which defaults to golden section search, and has Brent's method as an option by setting bl_optimizer=BrentsMethodOpt().\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.brents_method_minimize-Tuple{Any, Real, Real, Any, Real}","page":"Home","title":"MolecularEvolution.brents_method_minimize","text":"brents_method_minimize(f, a::Real, b::Real, transform, t::Real; ε::Real=sqrt(eps()))\n\nBrent's method for minimization.\n\nGiven a function f with a single local minimum in the interval (a,b), Brent's method returns an approximation of the x-value that minimizes f to an accuaracy between 2tol and 3tol, where tol is a combination of a relative and an absolute tolerance, tol := ε|x| + t. ε should be no smaller 2*eps, and preferably not much less than sqrt(eps), which is also the default value. eps is defined here as the machine epsilon in double precision. t should be positive.\n\nThe method combines the stability of a Golden Section Search and the superlinear convergence Successive Parabolic Interpolation has under certain conditions. The method never converges much slower than a Fibonacci search and for a sufficiently well-behaved f, convergence can be exptected to be superlinear, with an order that's usually atleast 1.3247...\n\nExamples\n\njulia> f(x) = exp(-x) - cos(x)\nf (generic function with 1 method)\n\njulia> m = brents_method_minimize(f, -1, 2, identity, 1e-7)\n0.5885327257940255\n\nFrom: Richard P. Brent, \"Algorithms for Minimization without Derivatives\" (1973). Chapter 5.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.cascading_max_state_dict-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.cascading_max_state_dict","text":"cascading_max_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.char_proportions-Tuple{Any, String}","page":"Home","title":"MolecularEvolution.char_proportions","text":"char_proportions(seqs, alphabet::String)\n\nTakes a vector of sequences and returns a vector of the proportion of each character across all sequences. An example alphabet argument is MolecularEvolution.AAstring.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.colored_seq_draw-Tuple{Any, Any, AbstractString}","page":"Home","title":"MolecularEvolution.colored_seq_draw","text":"colored_seq_draw(x, y, str::AbstractString; color_dict=Dict(), font_size=8pt, posx=hcenter, posy=vcenter)\n\nDraw an arbitrary sequence. color_dict gives a mapping from characters to colors (default black). Default options for nucleotide colorings and amino acid colorings are given in the constants NUC_COLORS and AA_COLORS. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.combine!-Tuple{DiscretePartition, DiscretePartition}","page":"Home","title":"MolecularEvolution.combine!","text":"combine!(dest::P, src::P) where P<:Partition\n\nCombines evidence from two partitions of the same type, storing the result in dest. Note: You should overload this for your own Partititon types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.deepequals-Union{Tuple{T}, Tuple{T, T}} where T<:AbstractTreeNode","page":"Home","title":"MolecularEvolution.deepequals","text":"deepequals(t1, t2)\n\nChecks whether two trees are equal by recursively calling this on all fields, except :parent, in order to prevent cycles. In order to ensure that the :parent field is not hiding something different on both trees, ensure that each is consistent first (see: istreeconsistent).\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.dfs_mapreduce-Union{Tuple{T}, Tuple{AbstractTreeNode, T, Any}} where T<:Function","page":"Home","title":"MolecularEvolution.dfs_mapreduce","text":"Performs a DFS map-reduce over the tree, starting at a given node See bfs_mapreduce for more details.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.discrete_name_color_dict-Tuple{AbstractTreeNode, Any}","page":"Home","title":"MolecularEvolution.discrete_name_color_dict","text":"discrete_name_color_dict(newt::AbstractTreeNode,tag_func; rainbow = false, scramble = false, darken = true, col_seed = nothing)\n\nTakes a tree and a tag_func, which converts the leaf label into a category (ie. there should be <20 of these), and returns a color dictionary that can be used to color the leaves or bubbles.\n\nExample tagfunc: function tagfunc(nam::String) return split(nam,\"_\")[1] end\n\nFor prettier colors, but less discrimination: rainbow = true To randomize the rainbow color assignment: scramble = true col_seed is currently set to white, and excluded from the list of colors, to make them more visible.\n\nConsider making your own version of this function to customize colors as you see fit.\n\nExample use: numleaves = 50 Nefunc(t) = 1*(e^-t).+5.0 newt = simtree(numleaves,Nefunc,1.0,nstart = rand(1:numleaves)); newt = ladderize(newt) tagfunc(nam) = mod(sum(Int.(collect(nam))),7) dic = discretenamecolordict(newt,tagfunc,rainbow = true); treedraw(newt,linewidth = 0.5mm,labelcolor_dict = dic)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.draw_example_tree-Tuple{}","page":"Home","title":"MolecularEvolution.draw_example_tree","text":"draw_example_tree(num_leaves = 50)\n\nDraws a tree and shows the code that draws it.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.endpoint_conditioned_sample_state_dict-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.endpoint_conditioned_sample_state_dict","text":"endpoint_conditioned_sample_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.expected_subs_per_site-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.expected_subs_per_site","text":"expected_subs_per_site(Q,mu)\n\nTakes a rate matrix Q and an equilibrium frequency vector, and calculates the expected number of substitutions per site.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.felsenstein!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.felsenstein!","text":"felsenstein!(node::FelNode, models; partition_list = nothing)\n\nShould usually be called on the root of the tree. Propagates Felsenstein pass up from the tips to the root. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.felsenstein_down!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.felsenstein_down!","text":"felsenstein_down!(node::FelNode, models; partition_list = 1:length(tree.message), temp_message = copy_message(tree.message))\n\nShould usually be called on the root of the tree. Propagates Felsenstein pass down from the root to the tips. felsenstein!() should usually be called first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.forward!-Tuple{DiscretePartition, DiscretePartition, GeneralCTMC, FelNode}","page":"Home","title":"MolecularEvolution.forward!","text":"forward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)\n\nPropagate the source partition forwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.gappy_Q_from_symmetric_rate_matrix-Tuple{Any, Any, Any}","page":"Home","title":"MolecularEvolution.gappy_Q_from_symmetric_rate_matrix","text":"gappy_Q_from_symmetric_rate_matrix(sym_mat, gap_rate, eq_freqs)\n\nTakes a symmetric rate matrix and gap rate (governing mutations to and from gaps) and returns a gappy rate matrix. The equilibrium frequencies are multiplied on column-wise.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.get_highlighter_legend-Tuple{Any}","page":"Home","title":"MolecularEvolution.get_highlighter_legend","text":"get_highlighter_legend(legend_colors)\n\nReturns a Compose object given an input dictionary or pairs mapping characters to colors.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.get_max_depth-Tuple{Any, Real}","page":"Home","title":"MolecularEvolution.get_max_depth","text":"get_max_depth(node,depth::Real)\n\nReturn the maximum depth of all children starting from the indicated node.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.get_phylo_tree-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.get_phylo_tree","text":"get_phylo_tree(molev_root::FelNode; data_function = (x -> Tuple{String,Float64}[]))\n\nConverts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.golden_section_maximize-Tuple{Any, Real, Real, Any, Real}","page":"Home","title":"MolecularEvolution.golden_section_maximize","text":"Golden section search.\n\nGiven a function f with a single local minimum in the interval [a,b], gss returns a subset interval [c,d] that contains the minimum with d-c <= tol.\n\nExamples\n\njulia> f(x) = -(x-2)^2\nf (generic function with 1 method)\n\njulia> m = golden_section_maximize(f, 1, 5, identity, 1e-10)\n2.0000000000051843\n\nFrom: https://en.wikipedia.org/wiki/Golden-section_search\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.highlight_seq_draw-Tuple{Any, Any, AbstractString, Any, Any, Any}","page":"Home","title":"MolecularEvolution.highlight_seq_draw","text":"highlight_seq_draw(x, y, str::AbstractString, region, basecolor, hicolor; fontsize=8pt, posx=hcenter, posy=vcenter)\n\nDraw a sequence, highlighting the sites given in region. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.highlighter_tree_draw-NTuple{4, Any}","page":"Home","title":"MolecularEvolution.highlighter_tree_draw","text":"highlighter_tree_draw(tree, ali_seqs, seqnames, master;\n highlighter_start = 1.1, highlighter_width = 1,\n coord_width = highlighter_start + highlighter_width + 0.1,\n scale_length = nothing, major_breaks = 1000, minor_breaks = 500,\n tree_args = NamedTuple[], legend_padding = 0.5cm, legend_colors = NUC_colors)\n\nDraws a combined tree and highlighter plot. The vector of seqnames must match the node names in tree.\n\nkwargs:\n\ntreeargs: kwargs to pass to `treedraw()`\nlegendcolors: Mapping of characters to highlighter colors (default NTcolors)\nscale_length: Length of the scale bar\nhighlighter_start: Canvas start for the highlighter panel\nhighlighter_width: Canvas width for the highlighter panel\ncoord_width: Total width of the canvas\nmajor_breaks: Numbered breaks for sequence axis\nminor_breaks: Ticks for sequence axis\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.internal_message_init!-Tuple{FelNode, Partition}","page":"Home","title":"MolecularEvolution.internal_message_init!","text":"internal_message_init!(tree::FelNode, partition::Partition)\n\nInitializes the message template for each node in the tree, as an array of the partition.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.internal_message_init!-Tuple{FelNode, Vector{<:Partition}}","page":"Home","title":"MolecularEvolution.internal_message_init!","text":"internal_message_init!(tree::FelNode, empty_message::Vector{<:Partition})\n\nInitializes the message template for each node in the tree, allocating space for each partition.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.istreeconsistent-Tuple{T} where T<:AbstractTreeNode","page":"Home","title":"MolecularEvolution.istreeconsistent","text":"istreeconsistent(root)\n\nChecks whether the :parent field is set to be consistent with the :child field for all nodes in the subtree. \n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.linear_scale-NTuple{5, Any}","page":"Home","title":"MolecularEvolution.linear_scale","text":"linear_scale(val,in_min,in_max,out_min,out_max)\n\nLinearly maps val which lives in [inmin,inmax] to a value in [outmin,outmax]\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.log_likelihood!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.log_likelihood!","text":"log_likelihood!(tree::FelNode, models; partition_list = nothing)\n\nFirst re-computes the upward felsenstein pass, and then computes the log likelihood of this tree. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.log_likelihood-Tuple{FelNode, BranchModel}","page":"Home","title":"MolecularEvolution.log_likelihood","text":"log_likelihood(tree::FelNode, models; partition_list = nothing)\n\nComputed the log likelihood of this tree. Requires felsenstein!() to have been run. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.longest_path-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.longest_path","text":"Returns the longest path in a tree For convenience, this is returned as two lists of form: [leafnode, parentnode, .... root] Where the leaf_node nodes are selected to be the furthest away\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.marginal_state_dict-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.marginal_state_dict","text":"marginal_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.matrix_for_display-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.matrix_for_display","text":"matrix_for_display(Q,labels)\n\nTakes a numerical matrix and a vector of labels, and returns a typically mixed type matrix with the numerical values and the labels. This is to easily visualize rate matrices in eg. the REPL.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.midpoint-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.midpoint","text":"Returns a midpoint as a node and a distance above it where the midpoint is\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.mix-Union{Tuple{SWMPartition{PType}}, Tuple{PType}} where PType<:DiscretePartition","page":"Home","title":"MolecularEvolution.mix","text":"mix(swm_part::SWMPartition{PType} ) where {PType <: MultiSitePartition}\n\nmix collapses a Site-Wise Mixture partition to a single component partition, weighted by the site-wise likelihoods for each component, and the init weights. Specifically, it takes a SWMPartition{Ptype} and returns a PType. You'll need to have this implemented for certain helper functionality if you're playing with new kinds of SWMPartitions that aren't mixtures of DiscretePartitions.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.name2node_dict-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.name2node_dict","text":"name2node_dict(root)\n\nReturns a dictionary of leaf nodes, indexed by node.name. Can be used to associate sequences with leaf nodes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.newick-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.newick","text":"newick(root)\n\nReturns a newick string representation of the tree.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.nni_optim!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.nni_optim!","text":"nni_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5)\n\nConsiders local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models). accrule allows you to specify a function that takes the current and proposed log likelihoods, and if true is returned the move is accepted.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.node_distances-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.node_distances","text":"Compute the distance to all other nodes from a given node\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.nonreversibleQ-Tuple{Any}","page":"Home","title":"MolecularEvolution.nonreversibleQ","text":"nonreversibleQ(param_vec)\n\nTakes a vector of parameters and returns a nonreversible rate matrix.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.parent_list-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.parent_list","text":"Provides a list of parent nodes nodes from this node up to the root node\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.partition2obs-Tuple{DiscretePartition, String}","page":"Home","title":"MolecularEvolution.partition2obs","text":"partition2obs(part::Partition)\n\nExtracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.populate_tree!-Tuple{FelNode, Partition, Any, Any}","page":"Home","title":"MolecularEvolution.populate_tree!","text":"populate_tree!(tree::FelNode, starting_message, names, data; init_all_messages = true, tolerate_missing = 1)\n\nTakes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if\n\ntolerate_missing = 0, an error will be thrown\ntolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)\ntolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.promote_internal-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.promote_internal","text":"promote_internal(tree::FelNode)\n\nCreates a new tree similar to the given tree, but with 'dummy' leaf nodes (w/ zero branchlength) representing each internal node (for drawing / evenly spacing labels internal nodes).\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.quadratic_CI-Tuple{Function, Vector, Int64}","page":"Home","title":"MolecularEvolution.quadratic_CI","text":"quadratic_CI(f::Function,opt_params::Vector, param_ind::Int; rate_conf_level = 0.99, nudge_amount = 0.01)\n\nTakes a NEGATIVE log likelihood function (compatible with Optim.jl), a vector of maximizing parameters, an a parameter index. Returns the quadratic confidence interval.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.quadratic_CI-Tuple{Vector, Vector}","page":"Home","title":"MolecularEvolution.quadratic_CI","text":"quadratic_CI(xvec,yvec; rate_conf_level = 0.99)\n\nTakes xvec, a vector of parameter values, and yvec, a vector of log likelihood evaluations (note: NOT the negative LLs you) might use with Optim.jl. Returns the confidence intervals computed by a quadratic approximation to the LL.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.read_fasta-Tuple{String}","page":"Home","title":"MolecularEvolution.read_fasta","text":"read_fasta(filepath::String)\n\nReads in a fasta file and returns a tuple of (seqnames, seqs).\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.read_newick_tree-Tuple{String}","page":"Home","title":"MolecularEvolution.read_newick_tree","text":"readnewicktree(treefile)\n\nReads in a tree from a file, of type FelNode\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.reversibleQ-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.reversibleQ","text":"reversibleQ(param_vec,eq_freqs)\n\nTakes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.root2tip_distances-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.root2tip_distances","text":"root2tips(root::AbstractTreeNode)\n\nReturns a vector of root-to-tip distances, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sample_down!-Tuple{FelNode, Any, Any}","page":"Home","title":"MolecularEvolution.sample_down!","text":"sampledown!(root::FelNode,models,partitionlist)\n\nGenerates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sample_from_message!-Tuple{Vector{<:Partition}}","page":"Home","title":"MolecularEvolution.sample_from_message!","text":"sample_from_message!(message::Vector{<:Partition})\n\n#Replaces an uncertain message with a sample from the distribution represented by each partition.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.savefig_tweakSVG-Tuple{Any, Context}","page":"Home","title":"MolecularEvolution.savefig_tweakSVG","text":"savefig_tweakSVG(fname, plot::Context; width = 10cm, height = 10cm, linecap_round = true, white_background = true)\n\nSaves a figure created using the Compose approach, but tweaks the SVG after export.\n\neg. savefig_tweakSVG(\"export.svg\",pl)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.savefig_tweakSVG-Tuple{Any, Plots.Plot}","page":"Home","title":"MolecularEvolution.savefig_tweakSVG","text":"savefig_tweakSVG(fname, plot::Plots.Plot; hack_bounding_box = true, new_viewbox = nothing, linecap_round = true)\n\nNote: Might only work if you're using the GR backend!! Saves a figure created using the Phylo Plots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.\n\neg. savefig_tweakSVG(\"export.svg\",pl, new_viewbox = [-100, -100, 3000, 4500])\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.shortest_path_between_nodes-Tuple{FelNode, FelNode}","page":"Home","title":"MolecularEvolution.shortest_path_between_nodes","text":"Shortest path between nodes, returned as two lists, each starting with one of the two nodes, and ending with the common ancestor\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sibling_inds-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.sibling_inds","text":"sibling_inds(node)\n\nReturns logical indices of the siblings in the parent's child's vector.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.siblings-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.siblings","text":"siblings(node)\n\nReturns a vector of siblings of node.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sim_tree-Tuple{Int64, Any, Any}","page":"Home","title":"MolecularEvolution.sim_tree","text":"sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)\n\nSimulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.\n\nNefunc(t) = (sin(t/10)+1)*100.0 + 10.0 root = simtree(600,Nefunc,1.0) simpletree_draw(ladderize(root))\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sim_tree-Tuple{}","page":"Home","title":"MolecularEvolution.sim_tree","text":"sim_tree(;n = 10)\n\nSimulates tree with constant population size.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.simple_radial_tree_plot-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.simple_radial_tree_plot","text":"simple_radial_tree_plot(root::FelNode; canvas_width = 10cm, line_color = \"black\", line_width = 0.1mm)\n\nDraws a radial tree. No frills. No labels. Canvas height is automatically determined to avoid distorting the tree.\n\nnewt = betternewickimport(\"((A:1,B:1,C:1,D:1,E:1,F:1,G:1):1,(H:1,I:1):1);\", FelNode{Float64}); simpleradialtreeplot(newt,linewidth = 0.5mm,root_angle = 7/10)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.simple_tree_draw-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.simple_tree_draw","text":"img = simpletreedraw(tree::FelNode; canvaswidth = 15cm, canvasheight = 15cm, linecolor = \"black\", linewidth = 0.1mm)\n\nA line drawing of a tree with very few options.\n\nimg = simple_tree_draw(tree)\nimg |> SVG(\"imgout.svg\",10cm, 10cm)\nOR\nusing Cairo\nimg |> PDF(\"imgout.pdf\",10cm, 10cm)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.total_LL-Tuple{Partition}","page":"Home","title":"MolecularEvolution.total_LL","text":"total_LL(p::Partition)\n\nIf called on the root, it returns the log likelihood associated with that partition. Can be overloaded for complex partitions without straightforward site log likelihoods.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree2distances-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.tree2distances","text":"tree2distances(root::AbstractTreeNode)\n\nReturns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree2shared_branch_lengths-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.tree2shared_branch_lengths","text":"tree2distances(root::AbstractTreeNode)\n\nReturns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree_draw-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.tree_draw","text":"tree_draw(tree::FelNode;\n canvas_width = 15cm, canvas_height = 15cm,\n stretch_for_labels = 2.0, draw_labels = true,\n line_width = 0.1mm, font_size = 4pt,\n min_dot_size = 0.00, max_dot_size = 0.01,\n line_opacity = 1.0,\n dot_opacity = 1.0,\n name_opacity = 1.0,\n horizontal = true,\n dot_size_dict = Dict(), dot_size_default = 0.0,\n dot_color_dict = Dict(), dot_color_default = \"black\",\n line_color_dict = Dict(), line_color_default = \"black\",\n label_color_dict = Dict(), label_color_default = \"black\",\n nodelabel_dict = Dict(),compose_dict = Dict()\n )\n\nDraws a tree with a number of self-explanatory options. Dictionaries that map a node to a color/size are used to control per-node plotting options. compose_dict must be a FelNode->function(x,y) dictionary that returns a compose() struct.\n\nExample using compose_dict\n\nstr_tree = \"(((((tax24:0.09731668728575642,(tax22:0.08792233964843627,tax18:0.9210388482867483):0.3200367900275155):0.6948314526087965,(tax13:1.9977212308725611,(tax15:0.4290074347886068,(tax17:0.32928401808187824,(tax12:0.3860215462534818,tax16:0.2197134841232339):0.1399122681886174):0.05744611946245004):1.4686085778061146):0.20724159879522402):0.4539334554156126,tax28:0.4885576926440158):0.002162260013924424,tax26:0.9451873777301325):3.8695419798779387,((tax29:0.10062813251515536,tax27:0.27653633028085006):0.04262434258357507,(tax25:0.009345653929737636,((tax23:0.015832941547076644,(tax20:0.5550597590956172,((tax8:0.6649025646927402,tax9:0.358506423199849):0.1439516404012261,tax11:0.01995439013213013):1.155181296134081):0.17930021667907567):0.10906638146207207,((((((tax6:0.013708993438720255,tax5:0.061144001556547097):0.1395453591567641,tax3:0.4713722705245479):0.07432598428904214,tax1:0.5993347898257291):1.0588025698844894,(tax10:0.13109032492533992,(tax4:0.8517302241963356,(tax2:0.8481963081549965,tax7:0.23754095940676642):0.2394313086297733):0.43596704123297675):0.08774657269409454):0.9345533723114966,(tax14:0.7089558245245173,tax19:0.444897137240675):0.08657675809803095):0.01632062723968511,tax21:0.029535281963725537):0.49502691718938285):0.25829576024240986):0.7339777396780424):4.148878039524972):0.0\"\nnewt = gettreefromnewick(str_tree, FelNode)\nladderize!(newt)\ncompose_dict = Dict()\nfor n in getleaflist(newt)\n #Replace the rand(4) with the frequencies you actually want.\n compose_dict[n] = (x,y)->pie_chart(x,y,MolecularEvolution.sum2one(rand(4)),size = 0.03)\nend\ntree_draw(newt,draw_labels = false,line_width = 0.5mm, compose_dict = compose_dict)\n\n\nimg = tree_draw(tree)\nimg |> SVG(\"imgout.svg\",10cm, 10cm)\nOR\nusing Cairo\nimg |> PDF(\"imgout.pdf\",10cm, 10cm)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree_polish!-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.tree_polish!","text":"tree_polish!(newt, models; tol = 10^-4, verbose = 1, topology = true)\n\nTakes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.unc2probvec-Tuple{Any}","page":"Home","title":"MolecularEvolution.unc2probvec","text":"unc2probvec(v)\n\nTakes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.univariate_maximize-Tuple{Any, Real, Real, Any, BrentsMethodOpt, Real}","page":"Home","title":"MolecularEvolution.univariate_maximize","text":"univariate_maximize(f, a::Real, b::Real, transform, optimizer::BrentsMethodOpt, t::Real; ε::Real=sqrt(eps))\n\nMaximizes f(x) using Brent's method. See ?brents_method_minimize.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.univariate_maximize-Tuple{Any, Real, Real, Any, GoldenSectionOpt, Real}","page":"Home","title":"MolecularEvolution.univariate_maximize","text":"univariate_maximize(f, a::Real, b::Real, transform, optimizer::GoldenSectionOpt, tol::Real)\n\nMaximizes f(x) using a Golden Section Search. See ?golden_section_maximize.\n\nExamples\n\njulia> f(x) = -(x-2)^2\nf (generic function with 1 method)\n\njulia> m = univariate_maximize(f, 1, 5, identity, GoldenSectionOpt(), 1e-10)\n2.0000000000051843\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.values_from_phylo_tree-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.values_from_phylo_tree","text":"values_from_phylo_tree(phylo_tree, key)\n\nReturns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.weightEM-Tuple{Matrix{Float64}, Any}","page":"Home","title":"MolecularEvolution.weightEM","text":"weightEM(con_lik_matrix::Array{Float64,2}, θ; conc = 0.0, iters = 500)\n\nTakes a conditional likelihood matrix (#categories-by-sites) and a starting frequency vector θ (length(θ) = #categories) and optimizes θ (using Expectation Maximization. Maybe.). If conc > 0 then this gives something like variational bayes behavior for LDA. Maybe.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.write_fasta-Tuple{String, Vector{String}}","page":"Home","title":"MolecularEvolution.write_fasta","text":"write_fasta(filepath::String, sequences::Vector{String}; seq_names = nothing)\n\nWrites a fasta file from a vector of sequences, with optional seq_names.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.write_nexus-Tuple{String, FelNode}","page":"Home","title":"MolecularEvolution.write_nexus","text":"write_nexus(fname::String,tree::FelNode)\n\nWrites the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.\n\n\n\n\n\n","category":"method"},{"location":"IO/#Input/Output","page":"Input/Output","title":"Input/Output","text":"","category":"section"},{"location":"IO/","page":"Input/Output","title":"Input/Output","text":"write_nexus\nnewick\nread_newick_tree\npopulate_tree!\nread_fasta\nwrite_fasta","category":"page"},{"location":"IO/#MolecularEvolution.write_nexus","page":"Input/Output","title":"MolecularEvolution.write_nexus","text":"write_nexus(fname::String,tree::FelNode)\n\nWrites the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.newick","page":"Input/Output","title":"MolecularEvolution.newick","text":"newick(root)\n\nReturns a newick string representation of the tree.\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.read_newick_tree","page":"Input/Output","title":"MolecularEvolution.read_newick_tree","text":"readnewicktree(treefile)\n\nReads in a tree from a file, of type FelNode\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.populate_tree!","page":"Input/Output","title":"MolecularEvolution.populate_tree!","text":"populate_tree!(tree::FelNode, starting_message, names, data; init_all_messages = true, tolerate_missing = 1)\n\nTakes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if\n\ntolerate_missing = 0, an error will be thrown\ntolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)\ntolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.read_fasta","page":"Input/Output","title":"MolecularEvolution.read_fasta","text":"read_fasta(filepath::String)\n\nReads in a fasta file and returns a tuple of (seqnames, seqs).\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.write_fasta","page":"Input/Output","title":"MolecularEvolution.write_fasta","text":"write_fasta(filepath::String, sequences::Vector{String}; seq_names = nothing)\n\nWrites a fasta file from a vector of sequences, with optional seq_names.\n\n\n\n\n\n","category":"function"}] +[{"location":"optimization/#Optimization","page":"Optimization","title":"Optimization","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"There are two distinct kinds of optimization: \"global\" model parameters, and then tree branchlengths and topology. These are kept distinct because we can use algorithmic tricks to dramatically improve the performance of the latter.","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"The example below will set up and optimize a \"Generalized Time Reversible\" nucleotide substitution model, where there are 6 rate parameters that govern the symmetric part of a rate matrix, and 4 nucleotide frequencies (that sum to 1, so only 3 underlying parameters).","category":"page"},{"location":"optimization/#Optimizing-model-parameters","page":"Optimization","title":"Optimizing model parameters","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"We first need to construct an objective function. A very common use case involves parameterizing a rate matrix (along with all the constraints this entails) from a flat parameter vector. reversibleQ can be convenient here, which takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle (excluding the diagonal) of the rate matrix:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"using MolecularEvolution #hide\nreversibleQ(1:6,ones(4))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"...and the equilibrium frequencies are multiplied column-wise:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"reversibleQ(ones(6),[0.1,0.2,0.3,0.4])","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Another convenient trick is to be able to parameterize a vector of positive frequencies that sum to 1, using N-1 unconstrained parameters. unc2probvec can help:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"unc2probvec(zeros(3))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"ParameterHandling.jl provides a convenient framework for managing collections of parameters in a way that plays with much of the Julia optimization ecosystem, and we recommend its use. Here we'll use ParameterHandling and NLopt.","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"First, we'll load in some example nucleotide data:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"using MolecularEvolution, FASTX, ParameterHandling, NLopt\n\n#Read in seqs and tree, and populate the three NucleotidePartitions\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\ninitial_partition = NucleotidePartition(length(seqs[1]))\npopulate_tree!(tree,initial_partition,seqnames,seqs)","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Then we set up the model parameters, and the objective function:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"#Named tuple of parameters, with initial values and constraints (from ParameterHandling.jl)\ninitial_params = (\n rates=positive(ones(6)), #rates must be non-negative\n pi=zeros(3) #will be transformed into 4 eq freqs\n)\nflat_initial_params, unflatten = value_flatten(initial_params) #See ParameterHandling.jl docs\nnum_params = length(flat_initial_params)\n\n#Set up a function that builds a model from these parameters\nfunction build_model_vec(params)\n pi = unc2probvec(params.pi)\n return DiagonalizedCTMC(reversibleQ(params.rates,pi))\nend\n\n#Set up the function to be *minimized*\nfunction objective(params::NamedTuple; tree = tree)\n #In this example, we are optimizing the nuc equilibrium freqs\n #We'll also assume that the starting frequencies (at the root of the tree) are the eq freqs\n tree.parent_message[1].state .= unc2probvec(params.pi)\n return -log_likelihood!(tree,build_model_vec(params)) #Note, negative of LL, because minimization\nend","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Then we'll set up an optimizer from NLOpt. See this discussion and this exploration of optimizers.","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"opt = Opt(:LN_BOBYQA, num_params)\n#Note: NLopt requires a function that returns a gradient, even for gradient free methods, hence (x,y)->...\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x)) #See ParameterHandling.jl docs for objective ∘ unflatten explanation\n#Some bounds (which will be in the transformed domain) to prevent searching numerically silly bits of parameter space:\nlower_bounds!(opt, [-10.0 for i in 1:num_params])\nupper_bounds!(opt, [10.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\n_,mini,_ = NLopt.optimize(opt, flat_initial_params)\nfinal_params = unflatten(mini)\n\noptimized_model = build_model_vec(final_params)\nprintln(\"Opt LL:\",log_likelihood!(tree,optimized_model))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Opt LL:-3783.226756522292","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"We can view the optimized parameter values:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"println(\"Rates: \", round.(final_params.rates,sigdigits = 4))\nprintln(\"Pi:\", round.(unc2probvec(final_params.pi),sigdigits = 4))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Rates: [1.124, 2.102, 1.075, 0.9802, 1.605, 0.5536]\nPi:[0.2796, 0.2192, 0.235, 0.2662]","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Or the entire optimized rate matrix:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"matrix_for_display(optimized_model.Q,['A','C','G','T'])","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Opt LL:-3783.226756522292\n5×5 Matrix{Any}:\n \"\" 'A' 'C' 'G' 'T'\n 'A' -1.02672 0.246386 0.494024 0.286309\n 'C' 0.314289 -0.971998 0.23034 0.427368\n 'G' 0.587774 0.214842 -0.950007 0.147391\n 'T' 0.300663 0.35183 0.130093 -0.782586","category":"page"},{"location":"optimization/#Optimizing-the-tree-topology-and-branch-lengths","page":"Optimization","title":"Optimizing the tree topology and branch lengths","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"With a tree and a model, we can also optimize the branch lengths and search, by nearest neighbour interchange for changes to the tree that improve the likelihood. Individually, these are performed by nni_optim! and branchlength_optim!, which need to have felsenstein! and felsenstein_down! called beforehand, but this is all bundled into:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"tree_polish!(tree, optimized_model)","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"LL: -3783.226756522292\nLL: -3782.345818028071\nLL: -3782.3231632207567\nLL: -3782.3211724011044\nLL: -3782.321068684831\nLL: -3782.3210622627776","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"And just to convince you this works, we can perturb the branch lengths, and see how the likelihood improves:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"for n in getnodelist(tree)\n n.branchlength *= (rand()+0.5)\nend\ntree_polish!(tree, optimzed_model)","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"LL: -3805.4140940138795\nLL: -3782.884883999107\nLL: -3782.351780962518\nLL: -3782.322906364547\nLL: -3782.321183009534\nLL: -3782.3210398963506\nLL: -3782.3210271696703","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"warning: Warning\ntree_polish! probably won't find a good tree from a completely start. Different tree search heuristics are required for that.","category":"page"},{"location":"optimization/#Functions","page":"Optimization","title":"Functions","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"reversibleQ\nunc2probvec\nbranchlength_optim!\nnni_optim!\ntree_polish!","category":"page"},{"location":"optimization/#MolecularEvolution.reversibleQ","page":"Optimization","title":"MolecularEvolution.reversibleQ","text":"reversibleQ(param_vec,eq_freqs)\n\nTakes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.unc2probvec","page":"Optimization","title":"MolecularEvolution.unc2probvec","text":"unc2probvec(v)\n\nTakes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.branchlength_optim!","page":"Optimization","title":"MolecularEvolution.branchlength_optim!","text":"branchlength_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5, bl_optimizer::UnivariateOpt = GoldenSectionOpt())\n\nUses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models). tol is the absolute tolerance for the bloptimizer which defaults to golden section search, and has Brent's method as an option by setting bl_optimizer=BrentsMethodOpt().\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.nni_optim!","page":"Optimization","title":"MolecularEvolution.nni_optim!","text":"nni_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5)\n\nConsiders local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models). accrule allows you to specify a function that takes the current and proposed log likelihoods, and if true is returned the move is accepted.\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.tree_polish!","page":"Optimization","title":"MolecularEvolution.tree_polish!","text":"tree_polish!(newt, models; tol = 10^-4, verbose = 1, topology = true)\n\nTakes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.\n\n\n\n\n\n","category":"function"},{"location":"simulation/#Simulation","page":"Simulation","title":"Simulation","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"The two key steps in phylogenetic simulation are 1) simulating the phylogeny itself, and 2) simulating data that evolves over the phylogeny.","category":"page"},{"location":"simulation/#Simulating-phylogenies","page":"Simulation","title":"Simulating phylogenies","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"warning: Warning\nWhile our sim_tree function seems to produce trees with the right shape, and is good enough for eg. generating varied tree shapes to evaluate different phylogeny inference schemes under, it is not yet sufficiently checked and tested for use where the details of the coalescent need to be absolutely accurate. It could, for example, be off by a constant factor somewhere. So if you plan on using this in a such a manner for a publication, please check the sim_tree code (and let us know).","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"If you just need a simple tree for testing things, then you can just use:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"tree = sim_tree(n=100)\ntree_draw(tree, draw_labels = false, canvas_height = 5cm)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"This has the characteristic \"coalescent under constant population size\" look.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"However, sim_tree is a bit more powerful than this: it aims to simulate branching under a coalescent process with flexible options for how the effective population size, as well as the sampling rate, might change over time. This is important, because the \"constant population size\" model is quite extreme, and most of the divergence happens in the early internal branches.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"A coalescent process runs backwards in time, starting from the most recent tip, and sampling backwards toward the root, coalescing nodes as it goes, and sometimes adding additional sampled tips. With sim_tree, if nstart = add_limit, then all the tips will be sampled at the same time, and the tree will be ultrametric.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"sim_tree has two arguments driving its flexibility. We'll start with sampling_rate, which controls the rate at which samples are added to the tree. Even under constant effective population size, this can produce interesting behavior.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"for sampling_rate in [5.0, 0.5, 0.05, 0.005]\n tree = sim_tree(100,1000.0,sampling_rate)\n display(tree_draw(tree, draw_labels = false, canvas_height = 5cm))\nend","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Above, this rate was just a fixed constant value, but we can also let this be a function. In this example, we'll plot the tree alongside the sampling rate function, as well as the cumulative number of samples through time.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"s(t) = ifelse(0 sum(x .> sample_times), xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"cumulative samples\", legend = :none))","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Note how the x axis of these plots is flipped, since the leaf furtherest from the root begins at time=0, and the coalescent runs backwards, from tip to root.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We can also vary the effective population size over time, which adds a different dimension of control. Here is an example showing the shape of a tree under exponential growth:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"n(t) = 100000*exp(-t/10)\ntree = sim_tree(100,n,100.0, nstart = 100)\ndisplay(tree_draw(tree, draw_labels = false, canvas_height = 7cm, canvas_width = 14cm))\n\nroot_dists,_ = MolecularEvolution.root2tip_distances(tree)\nplot(0.0:0.1:maximum(root_dists),n, xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"effective population size\", legend = :none)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Logistic growth, with a relatively low sampling rate, provides a reasonable model of an emerging virus that was only sampled later in its growth trajectory, such as HIV.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"n(t) = 10000/(1+exp(t-10))\ntree = sim_tree(100,n,20.0)\ndisplay(tree_draw(tree, draw_labels = false, canvas_height = 7cm, canvas_width = 14cm))\n\nroot_dists,_ = MolecularEvolution.root2tip_distances(tree)\ndisplay(plot(0.0:0.1:maximum(root_dists),n, xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"effective population size\", legend = :none))\n\nmrd = maximum(root_dists)\nsample_times = mrd .- root_dists\nplot(0.0:0.1:mrd,x -> sum(x .> sample_times), xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"cumulative samples\", legend = :none)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"How about a virus with a seasonally varying effective population size, where sampling is proportional to case counts? Between seasons, the effective population size gets so low that the next seasons clade arises from a one or two lineages in the previous season.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"n(t) = exp(sin(t/10) * 2.0 + 4)\ns(t) = n(t)/100\ntree = sim_tree(500,n,s)\ndisplay(tree_draw(tree, draw_labels = false))\n\n\nroot_dists,_ = MolecularEvolution.root2tip_distances(tree)\ndisplay(plot(0.0:0.1:maximum(root_dists),n, xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"effective population size\", legend = :none))\n\nmrd = maximum(root_dists)\nsample_times = mrd .- root_dists\nplot(0.0:0.1:mrd,x -> sum(x .> sample_times), xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"cumulative samples\", legend = :none)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Finally, the mutation_rate argument multiplicatively scales the branch lengths.","category":"page"},{"location":"simulation/#Simulating-evolution-over-phylogenies","page":"Simulation","title":"Simulating evolution over phylogenies","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We'll begin by simulating a tree, like the last example:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"using MolecularEvolution, FASTX, Phylo, Plots, CSV, DataFrames\n\nn(t) = exp(sin(t/10) * 2.0 + 4)\ns(t) = n(t)/100\ntree = sim_tree(500,n,s, mutation_rate = 0.005)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"If we need to open this tree in an external program, we can extract the Newick string representing this tree, and write it to a file:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"newick_string = newick(tree)\nopen(\"flu_sim.tre\",\"w\") do io\n println(io,newick_string)\nend","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Then we can set up a model. In this case, it'll be a combination of a nucleotide model of sequence evolution and Brownian motion over a continuous character.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"nuc_freqs = [0.2,0.3,0.3,0.2]\nnuc_rates = [1.0,2.0,1.0,1.0,1.6,0.5]\nnuc_model = DiagonalizedCTMC(reversibleQ(nuc_rates,nuc_freqs))\nbm_model = BrownianMotion(0.0,1.0)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"As usual, we set up the Partition structure, and load this onto our tree:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"message_template = [NucleotidePartition(nuc_freqs,300),GaussianPartition()]\ninternal_message_init!(tree, message_template)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Then we sample data under our model:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"sample_down!(tree, [nuc_model,bm_model])","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We'll can visualize the Brownian component of the simulation by loading it into the node_dict, and converting to a Phylo.jl tree.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"for n in getnodelist(tree)\n n.node_data = Dict([\"mu\"=>n.message[2].mean])\nend\nphylo_tree = get_phylo_tree(tree)\nplot(phylo_tree, showtips = false, line_z = \"mu\", colorbar = :none,\n linecolor = :darkrainbow, linewidth = 1.0, size = (600, 600))","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We can write the simulated data, including sequences and continuous characters, to a CSV:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"df = DataFrame()\ndf.names = [n.name for n in getleaflist(tree)]\ndf.seqs = [partition2obs(n.message[1]) for n in getleaflist(tree)]\ndf.mu = [partition2obs(n.message[2]) for n in getleaflist(tree)]\nCSV.write(\"flu_sim_seq_and_bm.csv\",df)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Or we could export just the sequences as .fasta","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"write_fasta(\"flu_sim_seq_and_bm.fasta\",df.seqs,seq_names = df.names)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Which will look something like this, when opened in AliView","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/#Functions","page":"Simulation","title":"Functions","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"sim_tree\nsample_down!\npartition2obs","category":"page"},{"location":"simulation/#MolecularEvolution.sim_tree","page":"Simulation","title":"MolecularEvolution.sim_tree","text":"sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)\n\nSimulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.\n\nNefunc(t) = (sin(t/10)+1)*100.0 + 10.0 root = simtree(600,Nefunc,1.0) simpletree_draw(ladderize(root))\n\n\n\n\n\nsim_tree(;n = 10)\n\nSimulates tree with constant population size.\n\n\n\n\n\n","category":"function"},{"location":"simulation/#MolecularEvolution.sample_down!","page":"Simulation","title":"MolecularEvolution.sample_down!","text":"sampledown!(root::FelNode,models,partitionlist)\n\nGenerates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"function"},{"location":"simulation/#MolecularEvolution.partition2obs","page":"Simulation","title":"MolecularEvolution.partition2obs","text":"partition2obs(part::Partition)\n\nExtracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.\n\n\n\n\n\n","category":"function"},{"location":"framework/#The-MolecularEvolution.jl-Framework","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"The organizing principle is that the core algorithms, including Felsenstein's algorithm, but also a related family of message passing algorithms and inference machinery, are implemented in a way that does not refer to any specific model or even to any particular data type.","category":"page"},{"location":"framework/#Partitions-and-BranchModels","page":"The MolecularEvolution.jl Framework","title":"Partitions and BranchModels","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"A Partition is a probabilistic representation of some kind of state. Specifically, it needs to be able to represent P(obs|state) and P(obs,state) when considered as functions of state. So it will typically be able to assign a probability to any possible value of state, and is unnormalized - not required to sum or integrate to 1 over all values of state. As an example, for a discrete state with 4 categories, this could just be a vector of 4 numbers.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"For a Partition type to be usable by MolecularEvolution.jl, the combine! function needs to be implemented. If you have P(obsA|state) and P(obsB|state), then combine! calculates P(obsA,obsB|state) under the assumption that obsA and obsB are conditionally independent given state. MolecularEvolution.jl tries to avoid allocating memory, so combine!(dest,src) places in dest the combined Partition in dest. For a discrete state with 4 categories, this is simply element-wise multiplication of two state vectors.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"A BranchModel defines how Partition distributions evolve along branches. Two functions need to be implemented: backward! and forward!. We imagine our trees with the root at the top, and forward! moves from root to tip, and backward! moves from tip to root. backward!(dest::P,src::P,m::BranchModel,n::FelNode) takes a src Partition, representing P(obs-below|state-at-bottom-of-branch), and modifies the dest Partition to be P(obs-below|state-at-top-of-branch), where the branch in question is the branch above the FelNode n. forward! goes in the opposite direction, from P(obs-above,state-at-top-of-branch) to P(obs-above,state-at-bottom-of-branch), with the Partitions now, confusingly, representing joint distributions.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"(Image: )","category":"page"},{"location":"framework/#Messages","page":"The MolecularEvolution.jl Framework","title":"Messages","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"Nodes on our trees work with messages, where a message is a vector of Partition structs. This is in case you wish to model multiple different data types on the same tree. Often, all the messages on the tree will just be arrays containing a single Partition, but if you're accessing them you need to remember that they're in an array!","category":"page"},{"location":"framework/#Trees","page":"The MolecularEvolution.jl Framework","title":"Trees","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"Each node in our tree is a FelNode (\"Fel\" for \"Felsenstein\"). They point to their parent nodes, and an array of their children, and they store their main vector of Partitions, but also cached versions of those from their parents and children, to allow certain message passing schemes. They also have a branchlength field, which tells eg. forward! and backward! how much evolution occurs along the branch above (ie. closer to the root) that node. They also allow for an arbitrary dictionary of node_data, in case a model needs any other branch-specific parameters.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"The set of algorithms needs to know which model to use for which partition, so the assumption made is that they'll see an array of models whose order will match the partition array. In general, we might want the models to vary from one branch to another, so the central algorithms take a function that associates a FelNode->Vector{: true)\n\nDescription\n\nIndicate that we want to do a downward pass, e.g. sample_down!. The function passed to the constructor takes a node::FelNode as input and returns a Bool that decides if node stores its observations.\n\n\n\n\n\n","category":"type"},{"location":"ancestors/#Ancestral-Reconstruction","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"","category":"section"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"Given a phylogeny, and observations on some set of leaf nodes, \"ancestral reconstruction\" describes a family of approaches for inferring the state of the ancestors, or the distribution over possible states of ancestors.","category":"page"},{"location":"ancestors/#Examples","page":"Ancestral Reconstruction","title":"Examples","text":"","category":"section"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"using MolecularEvolution\n\n#Simulate a small tree, with Brownian motion over it\ntree = sim_tree(n=10)\ninternal_message_init!(tree, GaussianPartition())\nbm_model = BrownianMotion(0.0,0.1)\nsample_down!(tree, bm_model)\n\nr(x) = round(x,sigdigits = 3)\nprintln(\"Leaf values:\")\nfor n in getleaflist(tree)\n println(n.name,\" : \",r(n.message[1].mean))\nend\n\nd = marginal_state_dict(tree,bm_model)\nprintln(\"Inferred internal means (±95% intervals):\")\nfor n in getnonleaflist(tree)\n m,s = d[n][1].mean,sqrt(d[n][1].var)\n println(r(m), \"±\", r(1.96*s), \" - true value: \",r(n.message[1].mean))\nend","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"Leaf values:\ntax8 : -1.03\ntax1 : -1.15\ntax9 : -1.67\ntax10 : -0.112\ntax6 : -0.0183\ntax2 : -0.0574\ntax3 : 0.207\ntax5 : 0.0021\ntax4 : 0.634\ntax7 : 0.544\nInferred internal means (±95% intervals):\n-0.485±0.815 - true value: -0.587\n-1.17±0.556 - true value: -1.37\n-1.1±0.256 - true value: -1.09\n0.116±0.45 - true value: 0.21\n0.0275±0.35 - true value: -0.035\n0.0216±0.283 - true value: 0.0177\n0.0459±0.13 - true value: 0.0485\n0.0532±0.122 - true value: 0.075\n0.571±0.147 - true value: 0.589","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"We can also find the values of the state for each node under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the (maximized) state of the parent node and the observations of all descendents. This ensures that the combination of ancestral states is, jointly, high likelihood. In the case of Brownian motion, these just happen to be the same as the marginal means, but that isn't necessarily the case for other models:","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"d = cascading_max_state_dict(tree,bm_model)\nprintln(\"Inferred internal values:\")\nfor n in getnonleaflist(tree)\n m = d[n][1].mean\n println(r(m), \" - true value: \",r(n.message[1].mean))\nend","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"Inferred most likely (jointly) internal values:\n-0.485 - true value: -0.587\n-1.17 - true value: -1.37\n-1.1 - true value: -1.09\n0.116 - true value: 0.21\n0.0275 - true value: -0.035\n0.0216 - true value: 0.0177\n0.0459 - true value: 0.0485\n0.0532 - true value: 0.075\n0.571 - true value: 0.589","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"And we can sample internal states under our model, but conditioned on the leaf observations:","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"d = endpoint_conditioned_sample_state_dict(tree,bm_model)\nprintln(\"Sampled states, conditioned on observed leaves:\")\nfor n in getnonleaflist(tree)\n m = d[n][1].mean\n println(r(m), \" - true value: \",r(n.message[1].mean))\nend","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"Sampled states, conditioned on observed leaves:\n-0.784 - true value: -0.587\n-1.3 - true value: -1.37\n-1.13 - true value: -1.09\n-0.155 - true value: 0.21\n0.0118 - true value: -0.035\n0.0305 - true value: 0.0177\n0.0913 - true value: 0.0485\n0.0542 - true value: 0.075\n0.498 - true value: 0.589","category":"page"},{"location":"ancestors/#Functions","page":"Ancestral Reconstruction","title":"Functions","text":"","category":"section"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"marginal_state_dict\ncascading_max_state_dict\nendpoint_conditioned_sample_state_dict","category":"page"},{"location":"ancestors/#MolecularEvolution.marginal_state_dict","page":"Ancestral Reconstruction","title":"MolecularEvolution.marginal_state_dict","text":"marginal_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"function"},{"location":"ancestors/#MolecularEvolution.cascading_max_state_dict","page":"Ancestral Reconstruction","title":"MolecularEvolution.cascading_max_state_dict","text":"cascading_max_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"function"},{"location":"ancestors/#MolecularEvolution.endpoint_conditioned_sample_state_dict","page":"Ancestral Reconstruction","title":"MolecularEvolution.endpoint_conditioned_sample_state_dict","text":"endpoint_conditioned_sample_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"function"},{"location":"examples/#Examples","page":"Examples","title":"Examples","text":"","category":"section"},{"location":"examples/#Example-1:-Amino-acid-ancestral-reconstruction-and-visualization","page":"Examples","title":"Example 1: Amino acid ancestral reconstruction and visualization","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"This example reads amino acid sequences from this FASTA file, and a phylogeny from this Newick tree file. A WAG amino acid model, augmented to explicitly model gap (ie. '-') characters, and a global substitution rate is estimated by maximum likelihood. Under this optimized model, the distribution over ancestral amino acids is constructed for each node, and visualized in multiple ways.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using MolecularEvolution, FASTX, Phylo, Plots\n\n#Read in seqs and tree\nseqnames, seqs = read_fasta(\"Data/MusAA_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusAA_IGHV.tre\")\n\n#Compute AA freqs, which become the equilibrium freqs of the model, and the initial root freqs\nAA_freqs = char_proportions(seqs,MolecularEvolution.gappyAAstring)\n#Build the Q matrix\nQ = gappy_Q_from_symmetric_rate_matrix(WAGmatrix,1.0,AA_freqs)\n#Build the model\nm = DiagonalizedCTMC(Q)\n#Set up the memory on the tree\ninitial_partition = GappyAminoAcidPartition(AA_freqs,length(seqs[1]))\npopulate_tree!(tree,initial_partition,seqnames,seqs)\n\n#Set up a likelihood function to find the scaling constant that best fits the branch lengths of the imported tree\n#Note, calling LL will change the rate, so make sure you set it to what you want after this has been called\nll = function(rate; m = m)\n m.r = rate\n return log_likelihood!(tree,m)\nend\nopt_rate = golden_section_maximize(ll, 0.0, 10.0, identity, 1e-11);\nplot(opt_rate*0.87:0.001:opt_rate*1.15,ll,size = (500,250),\n xlabel = \"rate\",ylabel = \"log likelihood\", legend = :none)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Then set the model parameters to the maximum likelihood estimate, and reconstruct the ancestral states.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"m.r = opt_rate\n#Reconstructing the marginal distributions of amino acids at internal nodes\nd = marginal_state_dict(tree,m)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"That's it! Everything else is for visualizing these ancestral states. We'll select a set of amino acid positions to visualize, corresponding to these two (red arrows) alignment columns:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#The alignment indices we want to pay attention to in our reconstructions\nmotif_inds = [52,53]\n\n#We'll compute a confidence score for the inferred marginal state\nconfidence(state,inds) = minimum([maximum(state[:,i]) for i in inds])\n\n#Map motifs to numbers, so we can work with more convenient continuous color scales\nall_motifs = sort(union([partition2obs(d[n][1])[motif_inds] for n in getnodelist(tree)]))\nmotif2num = Dict(zip(all_motifs,1:length(all_motifs)))\n\n#Populating the node_data dictionary to help with plotting\nfor n in getnodelist(tree)\n moti = partition2obs(d[n][1])[motif_inds]\n n.node_data = Dict([\n \"motif\"=>moti,\n \"motif_color\"=>motif2num[moti],\n \"uncertainty\"=>1-confidence(d[n][1].state,motif_inds)\n ])\nend\n\n#Transducing the MolecularEvolution FelNode tree to a Phylo.jl tree, which migrates node_data as well\nphylo_tree = get_phylo_tree(tree)\nnode_unc = values_from_phylo_tree(phylo_tree,\"uncertainty\")\n\nprintln(\"Greatest motif uncertainty: \",maximum([n.node_data[\"uncertainty\"] for n in getnodelist(tree)]))","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Greatest motif uncertainty: 0.6104376723068156","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Plotting, using discrete marker colors\npl = plot(phylo_tree,\n showtips = true, tipfont = 6, marker_group = \"motif\", palette = :seaborn_bright,\n markeralpha = 0.75, markerstrokewidth = 0, margins = 2Plots.cm, legend = :topleft,\n linewidth = 1.5, size = (400, 800))\n\nsavefig_tweakSVG(\"anc_tree_with_legend.svg\", pl)\npl","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Plotting, using discrete marker colors\npl = plot(phylo_tree, treetype = :fan,\n showtips = true, tipfont = 6, marker_group = \"motif\", palette = :seaborn_bright,\n markeralpha = 0.75, markerstrokewidth = 0, margins = 2Plots.cm, legend = :topleft,\n linewidth = 1.5, size = (800, 800))\n\nsavefig_tweakSVG(\"anc_circ_tree_with_legend.svg\", pl)\npl","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Plotting using continuous color scales, and using marker size to show uncertainty in reconstructions\ncolor_scale = :rainbow\npl = plot(phylo_tree, showtips = true, tipfont = 6, marker_z = \"motif_color\", line_z = \"motif_color\",\n markersize = 10 .* sqrt.(node_unc), linecolor = color_scale, markercolor = color_scale, markeralpha = 0.75,\n markerstrokewidth = 0,margins = 2Plots.cm, colorbar = :none, linewidth = 2.5, size = (400, 800))\n\n#Feeble attempt at a manual legend\nmotif_ys = collect(1:length(all_motifs)) .+ (length(seqs) - length(all_motifs))\nscatter!(zeros(length(all_motifs)) , motif_ys , marker = 8, markeralpha = 0.75,\n marker_z = 1:length(all_motifs), markercolor = color_scale, markerstrokewidth = 0.0)\nfor i in 1:length(all_motifs)\n annotate!(0.1, motif_ys[i], all_motifs[i],7)\nend\n\nsavefig_tweakSVG(\"anc_tree_continuous.svg\", pl)\npl","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/#Example-2:-GTRGamma","page":"Examples","title":"Example 2: GTR+Gamma","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"For site-to-site \"random effects\" rate variation, such as under the GTR+Gamma model, we need to use a \"Site-Wise Mixture\" model, or SWMModel with its SWMPartition.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Set up a function that will return a set of rates that will, when equally weighted, VERY coarsely approx a Gamma distribution\nfunction equiprobable_gamma_grids(s,k)\n grids = quantile(Gamma(s,1/s),1/2k:1/k:(1-1/2k))\n grids ./ mean(grids)\nend\n\n#Read in seqs and tree, and populate the three NucleotidePartitions\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\n\n#Set up the Partition that will be replicated in the SWMModel\ninitial_partition = NucleotidePartition(length(seqs[1]))\n\n#To be able to use unconstrained optimization, we use `ParameterHandling.jl`\ninitial_params = (\n rates=positive(ones(6)),\n gam_shape=positive(1.0),\n pi=zeros(3)\n)\nflat_initial_params, unflatten = value_flatten(initial_params)\nnum_params = length(flat_initial_params)\n\n#Setting up the Site-Wise Mixture Partition:\n#Note: this constructor sets the weights of all categories to 1/rate_cats\n#That is fine for our equi-probable category model, but this will need to be different for other models.\nrate_cats = 5\nREL_partition = MolecularEvolution.SWMPartition{NucleotidePartition}(initial_partition,rate_cats)\npopulate_tree!(tree,REL_partition,seqnames,seqs)\n\nfunction build_model_vec(params; cats = rate_cats)\n r_vals = equiprobable_gamma_grids(params.gam_shape,cats)\n pi = unc2probvec(params.pi)\n return MolecularEvolution.SWMModel(DiagonalizedCTMC(reversibleQ(params.rates,pi)),r_vals)\nend\n\nfunction objective(params::NamedTuple; tree = tree)\n v = unc2probvec(params.pi)\n #Root freqs need to be set over all component partitions\n for p in tree.parent_message[1].parts\n p.state .= v\n end\n return -log_likelihood!(tree,build_model_vec(params))\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\n\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\nscore,mini,did_it_work = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\noptimized_model = build_model_vec(final_params)\nLL = log_likelihood!(tree,optimized_model)\nprintln(did_it_work)\nprintln(\"Opt LL:\",LL)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"SUCCESS\nOpt LL:-3728.4761606135307","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Other functions also work with these kinds of random-effects site-wise mixture models:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"tree_polish!(tree,optimized_model)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"LL: -3728.4761606135307\nLL: -3728.1316616075173\nLL: -3728.121005993758\nLL: -3728.1202243978914\nLL: -3728.1201348447107","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Sometimes we might want the rate values for each category to stay fixed, but optimize their weights:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Using rate categories with fixed values\nfixed_cats = [0.00001,0.33,1.0,3.0,9.0]\n\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\n\ninitial_partition = NucleotidePartition(length(seqs[1]))\n\ninitial_params = (\n rates=positive(ones(6)),\n cat_weights=zeros(length(fixed_cats)-1), #Category weights\n pi=zeros(3) #Nuc freqs\n)\nflat_initial_params, unflatten = value_flatten(initial_params)\nnum_params = length(flat_initial_params)\n\nREL_partition = MolecularEvolution.SWMPartition{NucleotidePartition}(initial_partition,length(fixed_cats))\npopulate_tree!(tree,REL_partition,seqnames,seqs)\n\nfunction build_model_vec(params; cats = fixed_cats)\n cat_weights = unc2probvec(params.cat_weights)\n pi = unc2probvec(params.pi)\n m = MolecularEvolution.SWMModel(DiagonalizedCTMC(reversibleQ(params.rates,pi)),cats)\n m.weights .= cat_weights\n return m\nend\n\nfunction objective(params::NamedTuple; tree = tree)\n v = unc2probvec(params.pi)\n for p in tree.parent_message[1].parts\n p.state .= v\n end\n return -log_likelihood!(tree,build_model_vec(params))\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\n\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\nscore,mini,did_it_work = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\noptimized_model = build_model_vec(final_params)\nLL = log_likelihood!(tree,optimized_model)\n\nprintln(did_it_work)\nprintln(\"Opt LL:\",LL)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"SUCCESS\nOpt LL:-3719.6290948420706","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"When you have a Site-Wise Mixture (ie. REL) model, the category weights can be handled \"outside\" of the main likelihood calculations. This means that they can be optimized very quickly, within an objective function that is optimizing over the other parameters. The following example uses an EM approach to do this:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using Distributions, FASTX, ParameterHandling, NLopt\n\n#Using rate categories with fixed values\nfixed_cats = [(i/5)^2 for i in 1:12]\n\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\n\ninitial_partition = NucleotidePartition(length(seqs[1]))\n\ninitial_params = (\n rates=positive(ones(6)),\n pi=zeros(3) #Nuc freqs\n)\nflat_initial_params, unflatten = value_flatten(initial_params)\nnum_params = length(flat_initial_params)\n\nREL_partition = MolecularEvolution.SWMPartition{NucleotidePartition}(initial_partition,length(fixed_cats))\npopulate_tree!(tree,REL_partition,seqnames,seqs)\n\nfunction build_model_vec(params; cats = fixed_cats)\n pi = unc2probvec(params.pi)\n m = SWMModel(DiagonalizedCTMC(reversibleQ(params.rates,pi)),cats)\n return m\nend\n\n#LL for a mixture when the grid of probabilities is pre-computed\ngrid_ll(v,g) = sum(log.(sum((v./sum(v)) .* g,dims = 1)))\n\n#Note: we can get away with relatively few EM iterations within the optimization cycle (in this example at least)\nfunction opt_weights_and_LL(temp_part::SWMPartition{PType}; iters = 25) where {PType <: MolecularEvolution.MultiSitePartition} \n g,scals = SWM_prob_grid(temp_part) \n l = size(g)[1]\n #We can optimize the category weights without re-computing felsenstein\n #So it can make sense to do so within the optimization function\n #Which means you don't need to optimize over as many parameters\n θ = weightEM(g,ones(l)./l, iters = iters)\n LL_optimizing_over_weights = grid_ll(θ,g) + sum(scals)\n return θ,LL_optimizing_over_weights\nend\n\nfunction objective(params::NamedTuple; tree = tree)\n v = unc2probvec(params.pi)\n for p in tree.parent_message[1].parts\n p.state .= v\n end\n felsenstein!(tree,build_model_vec(params))\n #Optim inside optim\n #We first need to handle the merge of the parent and root partitions - usually handled for us magically!\n #Be careful: this example is hard-coded for a single partition\n temp_part = copy_partition(tree.parent_message[1])\n combine!(temp_part, tree.message[1])\n θ,LL = opt_weights_and_LL(temp_part)\n return -LL\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\n\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\n@time score,mini,did_it_work = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\noptimized_model = build_model_vec(final_params)\n\nfelsenstein!(tree,optimized_model)\ntemp_part = copy_partition(tree.parent_message[1])\ncombine!(temp_part, tree.message[1])\nθ,_ = opt_weights_and_LL(temp_part, iters = 1000) #polish weights for final pass - quick\noptimized_model.weights .= θ\nLL = log_likelihood!(tree,optimized_model)\n\nprintln(did_it_work, \":\", score)\nprintln(\"Opt LL:\",LL)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"3.932150 seconds (2.38 M allocations: 2.378 GiB, 10.78% gc time, 3.28% compilation time: 7% of which was recompilation)\nSUCCESS:3720.1347720900067\nOpt LL:-3719.4808937732614","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"This can be dramatically faster than trying to directly optimize over category weights when the number of categories grows. The above example took 140s with the direct approach.","category":"page"},{"location":"examples/#Example-3:-FUBAR","page":"Examples","title":"Example 3: FUBAR","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"This example reads codon sequences from this FASTA file, and a phylogeny from this Newick tree file, and implements FUBAR.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using MolecularEvolution, FASTX, ParameterHandling, NLopt, Plots\n\n#Read in seqs and tree\nseqnames, seqs = read_fasta(\"Data/Flu.fasta\")\ntree = read_newick_tree(\"Data/Flu.tre\")\n\n#Count F3x4 frequencies from the seqs, and estimate codon freqs from this\nf3x4 = MolecularEvolution.count_F3x4(seqs);\neq_freqs = MolecularEvolution.F3x4_eq_freqs(f3x4);\n\n#Set up a codon partition (will default to Universal genetic code)\ninitial_partition = CodonPartition(Int64(length(seqs[1])/3))\ninitial_partition.state .= eq_freqs\npopulate_tree!(tree,initial_partition,seqnames,seqs)\n\n#We'll use the empirical F3x4 freqs, fixed MG94 alpha=1, and optimize the nuc parameters and MG94 beta\n#Note: the nuc rates are confounded with alpha\ninitial_params = (\n rates=positive(ones(6)), #rates must be non-negative\n beta = positive(1.0)\n)\nflat_initial_params, unflatten = value_flatten(initial_params) #See ParameterHandling.jl docs\nnum_params = length(flat_initial_params)\n\nfunction build_model_vec(p; F3x4 = f3x4, alpha = 1.0)\n #If you run into numerical issues with DiagonalizedCTMC, switch to GeneralCTMC instead\n return DiagonalizedCTMC(MolecularEvolution.MG94_F3x4(alpha, p.beta, reversibleQ(p.rates,ones(4)), F3x4))\nend\n\nfunction objective(params::NamedTuple; tree = tree, eq_freqs = eq_freqs)\n return -log_likelihood!(tree,build_model_vec(params))\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\n@time _,mini,_ = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\nnucmat = reversibleQ(final_params.rates,ones(4))","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":" 10.596546 seconds (840.87 k allocations: 5.221 GiB, 7.45% gc time, 0.35% compilation time: 25% of which was recompilation)\n4×4 Matrix{Float64}:\n -9.41346 1.77048 6.85997 0.783008\n 1.77048 -7.24162 0.280525 5.19061\n 6.85997 0.280525 -8.651 1.5105\n 0.783008 5.19061 1.5105 -7.48412","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"The scaling of that nuc matrix reflects the fact that the we're using a tree that was estimated under a nuc model, but here we're optimizing a codon model. No issue: the nuc rates have absorbed this scaling difference.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Now we set up a 20-by-20 grid, slicing the MG94 α and β parameters at the following values:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"grid_values = 10 .^ (-1.35:0.152:1.6) .- 0.0423174293933042","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"20-element Vector{Float64}:\n 0.0023509298217921012\n 0.021069541732388508\n 0.047632328759699305\n 0.08532645148783018\n 0.13881657986865603\n 0.2147221488835822\n 0.3224365175323036\n 0.4752894025572635\n 0.6921964387638108\n 1.0\n 1.4367909587749033\n 2.05662245423022\n 2.9361990000358853\n 4.184368713262725\n 5.95559333316179\n 8.469062952630463\n 12.0358209216745\n 17.09725564569095\n 24.27972266134484\n 34.47205650419232","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Then we calculate the conditional likelihoods for each site. Note the 20-by-20 grid is stretched out into a length 400 vector to keep things simple. I'm avoiding reshape tricks to keep the grid structure clear.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"LL_matrix = zeros(length(grid_values)^2,initial_partition.sites);\nalpha_vec = zeros(length(grid_values)^2);\nalpha_ind_vec = zeros(Int64,length(grid_values)^2);\nbeta_vec = zeros(length(grid_values)^2);\nbeta_ind_vec = zeros(Int64,length(grid_values)^2);\n\ni = 1\n@time for (a,alpha) in enumerate(grid_values)\n for (b,beta) in enumerate(grid_values)\n alpha_vec[i],beta_vec[i] = alpha, beta\n alpha_ind_vec[i], beta_ind_vec[i] = a,b\n m = DiagonalizedCTMC(MolecularEvolution.MG94_F3x4(alpha, beta, nucmat, f3x4))\n felsenstein!(tree,m)\n #This is because we need to include the eq freqs in the site LLs:\n combine!(tree.message[1],tree.parent_message[1])\n LL_matrix[i,:] .= MolecularEvolution.site_LLs(tree.message[1])\n i += 1\n end\nend\nprob_matrix = exp.(LL_matrix .- maximum(LL_matrix,dims = 1))\nprob_matrix ./= sum(prob_matrix,dims = 1);","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Then we use an EM-like MAP algorithm to find the posterior grid weights, and visualize this surface:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"LDAθ = weightEM(prob_matrix, ones(length(alpha_vec))./length(alpha_vec), conc = 0.4, iters = 5000);\n\n#A function to viz the grid surface\nfunction gridplot(alpha_ind_vec,beta_ind_vec,grid_values,θ; title = \"\")\n scatter(alpha_ind_vec,beta_ind_vec, zcolor = θ, c = :darktest,\n markersize = sqrt(length(alpha_ind_vec))/2, markershape=:square, markerstrokewidth=0.0, size=(550,500),\n label = :none, xticks = (1:length(grid_values), round.(grid_values,digits = 3)), xrotation = 90,\n yticks = (1:length(grid_values), round.(grid_values,digits = 3)), margin=6Plots.mm,\n xlabel = \"α\", ylabel = \"β\", title = title)\n plot!(1:length(grid_values),1:length(grid_values),color = \"grey\", style = :dash, label = :none)\nend\n\ngridplot(alpha_ind_vec,beta_ind_vec,grid_values,LDAθ)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"We can see that the posterior distribution over sites is heavily concentrated at β<α. But are there any sites where β>α?","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"weighted_mat = prob_matrix .* LDAθ\nfor site in 1:size(prob_matrix)[2]\n pos = sum(weighted_mat[beta_vec .> alpha_vec,site])/sum(weighted_mat[:,site])\n if pos > 0.9\n println(\"Site $(site): P(β>α)=$(round(pos,digits = 4))\")\n end\nend","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Site 153: P(β>α)=0.9074\nSite 158: P(β>α)=0.9266\nSite 160: P(β>α)=0.9547","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"And let's visualize one of those sites:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"gridplot(alpha_ind_vec,beta_ind_vec,grid_values, weighted_mat[:,160]./sum(weighted_mat[:,160]))","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"viz/#Visualization","page":"Visualization","title":"Visualization","text":"","category":"section"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"We offer two routes to visualization. The first is using our own plotting routines, built atop Compose.jl. The second converts our trees to Phylo.jl trees, and plots with their Plots.jl recipes. The Compose, Plots, and Phylo dependencies are optional.","category":"page"},{"location":"viz/#Example-1","page":"Visualization","title":"Example 1","text":"","category":"section"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"using MolecularEvolution, Plots, Phylo\n\n#First simulate a tree, and then Brownian motion:\ntree = sim_tree(n=20)\ninternal_message_init!(tree, GaussianPartition())\nbm_model = BrownianMotion(0.0,0.1)\nsample_down!(tree, bm_model)\n\n#We'll add the Gaussian means to the node_data dictionaries\nfor n in getnodelist(tree)\n n.node_data = Dict([\"mu\"=>n.message[1].mean])\nend\n\n#Transducing the mol ev tree to a Phylo.jl tree\nphylo_tree = get_phylo_tree(tree)\n\npl = plot(phylo_tree,\n showtips = true, tipfont = 6, marker_z = \"mu\", markeralpha = 0.5, line_z = \"mu\", linecolor = :darkrainbow, \n markersize = 4.0, markerstrokewidth = 0,margins = 1Plots.cm,\n linewidth = 1.5, markercolor = :darkrainbow, size = (500, 500))","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"(Image: )","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"We also offer savefig_tweakSVG(\"simple_plot_example.svg\", pl) for some post-processing tricks that improve the exported trees, like rounding line caps, and values_from_phylo_tree(phylo_tree,\"mu\") which can extract stored quantities in the right order for passing into eg. markersize options when plotting.","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"For a more comprehensive list of things you can do with Phylo.jl plots, please see their documentation.","category":"page"},{"location":"viz/#Drawing-trees-with-Compose.jl.","page":"Visualization","title":"Drawing trees with Compose.jl.","text":"","category":"section"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"The Compose.jl in-house tree drawing offers extensive flexibility. Here is an example that plots a pie chart representing the marginal probability of each of the 4 possible nucleotides on all nodes on the tree:","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"using MolecularEvolution, Compose\n\ntree = sim_tree(40,1000.0,0.005,mutation_rate = 0.001)\nmodel = DiagonalizedCTMC(reversibleQ(ones(6),ones(4)./4))\ninternal_message_init!(tree, NucleotidePartition(ones(4)./4,1))\nsample_down!(tree,model)\nd = marginal_state_dict(tree,model);\n\ncompose_dict = Dict()\nfor n in getnodelist(tree)\n compose_dict[n] = (x,y)->pie_chart(x,y,d[n][1].state[:,1],size = 0.02, opacity = 0.75)\nend\nimg = tree_draw(tree,draw_labels = false, line_width = 0.5mm, compose_dict = compose_dict)","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"(Image: )","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"This can then be exported with:","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"savefig_tweakSVG(\"piechart_tree.svg\",img)","category":"page"},{"location":"viz/#Functions","page":"Visualization","title":"Functions","text":"","category":"section"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"get_phylo_tree\nvalues_from_phylo_tree\nsavefig_tweakSVG\ntree_draw","category":"page"},{"location":"viz/#MolecularEvolution.get_phylo_tree","page":"Visualization","title":"MolecularEvolution.get_phylo_tree","text":"get_phylo_tree(molev_root::FelNode; data_function = (x -> Tuple{String,Float64}[]))\n\nConverts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.\n\n\n\n\n\n","category":"function"},{"location":"viz/#MolecularEvolution.values_from_phylo_tree","page":"Visualization","title":"MolecularEvolution.values_from_phylo_tree","text":"values_from_phylo_tree(phylo_tree, key)\n\nReturns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.\n\n\n\n\n\n","category":"function"},{"location":"viz/#MolecularEvolution.savefig_tweakSVG","page":"Visualization","title":"MolecularEvolution.savefig_tweakSVG","text":"savefig_tweakSVG(fname, plot::Plots.Plot; hack_bounding_box = true, new_viewbox = nothing, linecap_round = true)\n\nNote: Might only work if you're using the GR backend!! Saves a figure created using the Phylo Plots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.\n\neg. savefig_tweakSVG(\"export.svg\",pl, new_viewbox = [-100, -100, 3000, 4500])\n\n\n\n\n\nsavefig_tweakSVG(fname, plot::Context; width = 10cm, height = 10cm, linecap_round = true, white_background = true)\n\nSaves a figure created using the Compose approach, but tweaks the SVG after export.\n\neg. savefig_tweakSVG(\"export.svg\",pl)\n\n\n\n\n\n","category":"function"},{"location":"viz/#MolecularEvolution.tree_draw","page":"Visualization","title":"MolecularEvolution.tree_draw","text":"tree_draw(tree::FelNode;\n canvas_width = 15cm, canvas_height = 15cm,\n stretch_for_labels = 2.0, draw_labels = true,\n line_width = 0.1mm, font_size = 4pt,\n min_dot_size = 0.00, max_dot_size = 0.01,\n line_opacity = 1.0,\n dot_opacity = 1.0,\n name_opacity = 1.0,\n horizontal = true,\n dot_size_dict = Dict(), dot_size_default = 0.0,\n dot_color_dict = Dict(), dot_color_default = \"black\",\n line_color_dict = Dict(), line_color_default = \"black\",\n label_color_dict = Dict(), label_color_default = \"black\",\n nodelabel_dict = Dict(),compose_dict = Dict()\n )\n\nDraws a tree with a number of self-explanatory options. Dictionaries that map a node to a color/size are used to control per-node plotting options. compose_dict must be a FelNode->function(x,y) dictionary that returns a compose() struct.\n\nExample using compose_dict\n\nstr_tree = \"(((((tax24:0.09731668728575642,(tax22:0.08792233964843627,tax18:0.9210388482867483):0.3200367900275155):0.6948314526087965,(tax13:1.9977212308725611,(tax15:0.4290074347886068,(tax17:0.32928401808187824,(tax12:0.3860215462534818,tax16:0.2197134841232339):0.1399122681886174):0.05744611946245004):1.4686085778061146):0.20724159879522402):0.4539334554156126,tax28:0.4885576926440158):0.002162260013924424,tax26:0.9451873777301325):3.8695419798779387,((tax29:0.10062813251515536,tax27:0.27653633028085006):0.04262434258357507,(tax25:0.009345653929737636,((tax23:0.015832941547076644,(tax20:0.5550597590956172,((tax8:0.6649025646927402,tax9:0.358506423199849):0.1439516404012261,tax11:0.01995439013213013):1.155181296134081):0.17930021667907567):0.10906638146207207,((((((tax6:0.013708993438720255,tax5:0.061144001556547097):0.1395453591567641,tax3:0.4713722705245479):0.07432598428904214,tax1:0.5993347898257291):1.0588025698844894,(tax10:0.13109032492533992,(tax4:0.8517302241963356,(tax2:0.8481963081549965,tax7:0.23754095940676642):0.2394313086297733):0.43596704123297675):0.08774657269409454):0.9345533723114966,(tax14:0.7089558245245173,tax19:0.444897137240675):0.08657675809803095):0.01632062723968511,tax21:0.029535281963725537):0.49502691718938285):0.25829576024240986):0.7339777396780424):4.148878039524972):0.0\"\nnewt = gettreefromnewick(str_tree, FelNode)\nladderize!(newt)\ncompose_dict = Dict()\nfor n in getleaflist(newt)\n #Replace the rand(4) with the frequencies you actually want.\n compose_dict[n] = (x,y)->pie_chart(x,y,MolecularEvolution.sum2one(rand(4)),size = 0.03)\nend\ntree_draw(newt,draw_labels = false,line_width = 0.5mm, compose_dict = compose_dict)\n\n\nimg = tree_draw(tree)\nimg |> SVG(\"imgout.svg\",10cm, 10cm)\nOR\nusing Cairo\nimg |> PDF(\"imgout.pdf\",10cm, 10cm)\n\n\n\n\n\n","category":"function"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = MolecularEvolution","category":"page"},{"location":"#MolecularEvolution","page":"Home","title":"MolecularEvolution","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Documentation for MolecularEvolution.","category":"page"},{"location":"#A-Julia-package-for-the-flexible-development-of-phylogenetic-models.","page":"Home","title":"A Julia package for the flexible development of phylogenetic models.","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"MolecularEvolution.jl exploits Julia's multiple dispatch, implementing a fully generic suite of likelihood calculations, branchlength optimization, topology optimization, and ancestral inference. Users can construct trees using already-defined data types and models. But users can define probability distributions over their own data types, and specify the behavior of these under their own model types, and can mix and match different models on the same phylogeny.","category":"page"},{"location":"","page":"Home","title":"Home","text":"If the behavior you need is not already available in MolecularEvolution.jl:","category":"page"},{"location":"","page":"Home","title":"Home","text":"If you have a new data type:\nA Partition type that represents the uncertainty over your state. \ncombine!() that merges evidence from two Partitions.\nIf you have a new model:\nA BranchModel type that stores your model parameters.\nforward!() that evolves state distributions over branches, in the root-to-tip direction.\nbackward!() that reverse-evolves state distributions over branches, in the tip-to-root direction.","category":"page"},{"location":"","page":"Home","title":"Home","text":"And then sampling, likelihood calculations, branch-length optimization, ancestral reconstruction, etc should be available for your new data or model.","category":"page"},{"location":"#Design-principles","page":"Home","title":"Design principles","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"In order of importance, we aim for the following:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Flexibility and generality\nWhere possible, we avoid design decisions that limit the development of new models, or make it harder to develop new models.\nWe do not sacrifice flexibility for performance.\nScalability\nAnalyses implemented using MolecularEvolution.jl should scale to large, real-world datasets.\nPerformance\nWhile the above take precedence over speed, it should be possible to optimize your Partition, combine!(), BranchModel, forward!() and backward!() functions to obtain competative runtimes.","category":"page"},{"location":"#Authors:","page":"Home","title":"Authors:","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Venkatesh Kumar and Ben Murrell, with additional contributions by Sanjay Mohan, Alec Pankow, Hassan Sadiq, and Kenta Sato.","category":"page"},{"location":"#Quick-example:-Likelihood-calculations-under-phylogenetic-Brownian-motion:","page":"Home","title":"Quick example: Likelihood calculations under phylogenetic Brownian motion:","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"using MolecularEvolution, Plots\n\n#First simulate a tree, using a coalescent process\ntree = sim_tree(n=200)\ninternal_message_init!(tree, GaussianPartition())\n#Simulate brownian motion over the tree\nbm_model = BrownianMotion(0.0,1.0)\nsample_down!(tree, bm_model)\n#And plot the log likelihood as a function of the parameter value\nll(x) = log_likelihood!(tree,BrownianMotion(0.0,x))\nplot(0.7:0.001:1.6,ll, xlabel = \"variance per unit time\", ylabel = \"log likelihood\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"(Image: )","category":"page"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"","page":"Home","title":"Home","text":"Modules = [MolecularEvolution]","category":"page"},{"location":"#MolecularEvolution.LazyDown","page":"Home","title":"MolecularEvolution.LazyDown","text":"Constructors\n\nLazyDown(stores_obs)\nLazyDown() = LazyDown(x::FelNode -> true)\n\nDescription\n\nIndicate that we want to do a downward pass, e.g. sample_down!. The function passed to the constructor takes a node::FelNode as input and returns a Bool that decides if node stores its observations.\n\n\n\n\n\n","category":"type"},{"location":"#MolecularEvolution.LazyPartition","page":"Home","title":"MolecularEvolution.LazyPartition","text":"Constructor\n\nLazyPartition{PType}()\n\nInitialize an empty LazyPartition that is meant for wrapping a partition of type PType.\n\nDescription\n\nWith this data structure, you can wrap a partition of choice. The idea is that in some message passing algorithms, there is only a wave of partitions which need to actualize. For instance, a wave following a root-leaf path, or a depth-first traversal. In which case, we can be more economical with our memory consumption. With a worst case memory complexity of O(log(n)), where n is the number of nodes, functionality is provided for:\n\nlog_likelihood!\nfelsenstein!\nsample_down!\n\nnote: Note\nFor successive felsenstein! calls, we need to extract the information at the root somehow after each call. This can be done with e.g. total_LL or site_LLs.\n\nFurther requirements\n\nSuppose you want to wrap a partition of PType with LazyPartition:\n\nIf you're calling log_likelihood! and felsenstein!:\nobs2partition!(partition::PType, obs) that transforms an observation to a partition.\nIf you're calling sample_down!:\npartition2obs(partition::PType) that returns the most likely state from a partition, inverts obs2partition!.\n\n\n\n\n\n","category":"type"},{"location":"#MolecularEvolution.LazyUp","page":"Home","title":"MolecularEvolution.LazyUp","text":"Constructor\n\nLazyUp()\n\nDescription\n\nIndicate that we want to do an upward pass, e.g. felsenstein!.\n\n\n\n\n\n","category":"type"},{"location":"#Base.:==-Union{Tuple{T}, Tuple{T, T}} where T<:AbstractTreeNode","page":"Home","title":"Base.:==","text":"==(t1, t2)\nDefaults to pointer equality\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.SWM_prob_grid-Union{Tuple{SWMPartition{PType}}, Tuple{PType}} where PType<:MultiSitePartition","page":"Home","title":"MolecularEvolution.SWM_prob_grid","text":"SWM_prob_grid(part::SWMPartition{PType}) where {PType <: MultiSitePartition}\n\nReturns a matrix of probabilities for each site, for each model (in the probability domain - not logged!) as well as the log probability offsets\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution._mapreduce-Union{Tuple{T}, Tuple{AbstractTreeNode, T, Any, Any}} where T<:Function","page":"Home","title":"MolecularEvolution._mapreduce","text":"Internal function. Helper for bfsmapreduce and dfsmapreduce\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.backward!-Tuple{DiscretePartition, DiscretePartition, GeneralCTMC, FelNode}","page":"Home","title":"MolecularEvolution.backward!","text":"backward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)\n\nPropagate the source partition backwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.bfs_mapreduce-Union{Tuple{T}, Tuple{AbstractTreeNode, T, Any}} where T<:Function","page":"Home","title":"MolecularEvolution.bfs_mapreduce","text":"Performs a BFS map-reduce over the tree, starting at a given node For each node, mapreduce is called as: mapreduce(currnode::FelNode, prevnode::FelNode, aggregator) where prev_node is the previous node visited on the path from the start node to the current node It is expected to update the aggregator, and not return anything.\n\nNot exactly conventional map-reduce, as map-reduce calls may rely on state in the aggregator added by map-reduce calls on other nodes visited earlier.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.branchlength_optim!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.branchlength_optim!","text":"branchlength_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5, bl_optimizer::UnivariateOpt = GoldenSectionOpt())\n\nUses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models). tol is the absolute tolerance for the bloptimizer which defaults to golden section search, and has Brent's method as an option by setting bl_optimizer=BrentsMethodOpt().\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.brents_method_minimize-Tuple{Any, Real, Real, Any, Real}","page":"Home","title":"MolecularEvolution.brents_method_minimize","text":"brents_method_minimize(f, a::Real, b::Real, transform, t::Real; ε::Real=sqrt(eps()))\n\nBrent's method for minimization.\n\nGiven a function f with a single local minimum in the interval (a,b), Brent's method returns an approximation of the x-value that minimizes f to an accuaracy between 2tol and 3tol, where tol is a combination of a relative and an absolute tolerance, tol := ε|x| + t. ε should be no smaller 2*eps, and preferably not much less than sqrt(eps), which is also the default value. eps is defined here as the machine epsilon in double precision. t should be positive.\n\nThe method combines the stability of a Golden Section Search and the superlinear convergence Successive Parabolic Interpolation has under certain conditions. The method never converges much slower than a Fibonacci search and for a sufficiently well-behaved f, convergence can be exptected to be superlinear, with an order that's usually atleast 1.3247...\n\nExamples\n\njulia> f(x) = exp(-x) - cos(x)\nf (generic function with 1 method)\n\njulia> m = brents_method_minimize(f, -1, 2, identity, 1e-7)\n0.5885327257940255\n\nFrom: Richard P. Brent, \"Algorithms for Minimization without Derivatives\" (1973). Chapter 5.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.cascading_max_state_dict-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.cascading_max_state_dict","text":"cascading_max_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.char_proportions-Tuple{Any, String}","page":"Home","title":"MolecularEvolution.char_proportions","text":"char_proportions(seqs, alphabet::String)\n\nTakes a vector of sequences and returns a vector of the proportion of each character across all sequences. An example alphabet argument is MolecularEvolution.AAstring.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.colored_seq_draw-Tuple{Any, Any, AbstractString}","page":"Home","title":"MolecularEvolution.colored_seq_draw","text":"colored_seq_draw(x, y, str::AbstractString; color_dict=Dict(), font_size=8pt, posx=hcenter, posy=vcenter)\n\nDraw an arbitrary sequence. color_dict gives a mapping from characters to colors (default black). Default options for nucleotide colorings and amino acid colorings are given in the constants NUC_COLORS and AA_COLORS. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.combine!-Tuple{DiscretePartition, DiscretePartition}","page":"Home","title":"MolecularEvolution.combine!","text":"combine!(dest::P, src::P) where P<:Partition\n\nCombines evidence from two partitions of the same type, storing the result in dest. Note: You should overload this for your own Partititon types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.deepequals-Union{Tuple{T}, Tuple{T, T}} where T<:AbstractTreeNode","page":"Home","title":"MolecularEvolution.deepequals","text":"deepequals(t1, t2)\n\nChecks whether two trees are equal by recursively calling this on all fields, except :parent, in order to prevent cycles. In order to ensure that the :parent field is not hiding something different on both trees, ensure that each is consistent first (see: istreeconsistent).\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.dfs_mapreduce-Union{Tuple{T}, Tuple{AbstractTreeNode, T, Any}} where T<:Function","page":"Home","title":"MolecularEvolution.dfs_mapreduce","text":"Performs a DFS map-reduce over the tree, starting at a given node See bfs_mapreduce for more details.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.discrete_name_color_dict-Tuple{AbstractTreeNode, Any}","page":"Home","title":"MolecularEvolution.discrete_name_color_dict","text":"discrete_name_color_dict(newt::AbstractTreeNode,tag_func; rainbow = false, scramble = false, darken = true, col_seed = nothing)\n\nTakes a tree and a tag_func, which converts the leaf label into a category (ie. there should be <20 of these), and returns a color dictionary that can be used to color the leaves or bubbles.\n\nExample tagfunc: function tagfunc(nam::String) return split(nam,\"_\")[1] end\n\nFor prettier colors, but less discrimination: rainbow = true To randomize the rainbow color assignment: scramble = true col_seed is currently set to white, and excluded from the list of colors, to make them more visible.\n\nConsider making your own version of this function to customize colors as you see fit.\n\nExample use: numleaves = 50 Nefunc(t) = 1*(e^-t).+5.0 newt = simtree(numleaves,Nefunc,1.0,nstart = rand(1:numleaves)); newt = ladderize(newt) tagfunc(nam) = mod(sum(Int.(collect(nam))),7) dic = discretenamecolordict(newt,tagfunc,rainbow = true); treedraw(newt,linewidth = 0.5mm,labelcolor_dict = dic)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.draw_example_tree-Tuple{}","page":"Home","title":"MolecularEvolution.draw_example_tree","text":"draw_example_tree(num_leaves = 50)\n\nDraws a tree and shows the code that draws it.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.endpoint_conditioned_sample_state_dict-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.endpoint_conditioned_sample_state_dict","text":"endpoint_conditioned_sample_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.expected_subs_per_site-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.expected_subs_per_site","text":"expected_subs_per_site(Q,mu)\n\nTakes a rate matrix Q and an equilibrium frequency vector, and calculates the expected number of substitutions per site.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.felsenstein!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.felsenstein!","text":"felsenstein!(node::FelNode, models; partition_list = nothing)\n\nShould usually be called on the root of the tree. Propagates Felsenstein pass up from the tips to the root. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.felsenstein_down!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.felsenstein_down!","text":"felsenstein_down!(node::FelNode, models; partition_list = 1:length(tree.message), temp_message = copy_message(tree.message))\n\nShould usually be called on the root of the tree. Propagates Felsenstein pass down from the root to the tips. felsenstein!() should usually be called first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.forward!-Tuple{DiscretePartition, DiscretePartition, GeneralCTMC, FelNode}","page":"Home","title":"MolecularEvolution.forward!","text":"forward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)\n\nPropagate the source partition forwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.gappy_Q_from_symmetric_rate_matrix-Tuple{Any, Any, Any}","page":"Home","title":"MolecularEvolution.gappy_Q_from_symmetric_rate_matrix","text":"gappy_Q_from_symmetric_rate_matrix(sym_mat, gap_rate, eq_freqs)\n\nTakes a symmetric rate matrix and gap rate (governing mutations to and from gaps) and returns a gappy rate matrix. The equilibrium frequencies are multiplied on column-wise.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.get_highlighter_legend-Tuple{Any}","page":"Home","title":"MolecularEvolution.get_highlighter_legend","text":"get_highlighter_legend(legend_colors)\n\nReturns a Compose object given an input dictionary or pairs mapping characters to colors.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.get_max_depth-Tuple{Any, Real}","page":"Home","title":"MolecularEvolution.get_max_depth","text":"get_max_depth(node,depth::Real)\n\nReturn the maximum depth of all children starting from the indicated node.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.get_phylo_tree-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.get_phylo_tree","text":"get_phylo_tree(molev_root::FelNode; data_function = (x -> Tuple{String,Float64}[]))\n\nConverts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.golden_section_maximize-Tuple{Any, Real, Real, Any, Real}","page":"Home","title":"MolecularEvolution.golden_section_maximize","text":"Golden section search.\n\nGiven a function f with a single local minimum in the interval [a,b], gss returns a subset interval [c,d] that contains the minimum with d-c <= tol.\n\nExamples\n\njulia> f(x) = -(x-2)^2\nf (generic function with 1 method)\n\njulia> m = golden_section_maximize(f, 1, 5, identity, 1e-10)\n2.0000000000051843\n\nFrom: https://en.wikipedia.org/wiki/Golden-section_search\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.highlight_seq_draw-Tuple{Any, Any, AbstractString, Any, Any, Any}","page":"Home","title":"MolecularEvolution.highlight_seq_draw","text":"highlight_seq_draw(x, y, str::AbstractString, region, basecolor, hicolor; fontsize=8pt, posx=hcenter, posy=vcenter)\n\nDraw a sequence, highlighting the sites given in region. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.highlighter_tree_draw-NTuple{4, Any}","page":"Home","title":"MolecularEvolution.highlighter_tree_draw","text":"highlighter_tree_draw(tree, ali_seqs, seqnames, master;\n highlighter_start = 1.1, highlighter_width = 1,\n coord_width = highlighter_start + highlighter_width + 0.1,\n scale_length = nothing, major_breaks = 1000, minor_breaks = 500,\n tree_args = NamedTuple[], legend_padding = 0.5cm, legend_colors = NUC_colors)\n\nDraws a combined tree and highlighter plot. The vector of seqnames must match the node names in tree.\n\nkwargs:\n\ntreeargs: kwargs to pass to `treedraw()`\nlegendcolors: Mapping of characters to highlighter colors (default NTcolors)\nscale_length: Length of the scale bar\nhighlighter_start: Canvas start for the highlighter panel\nhighlighter_width: Canvas width for the highlighter panel\ncoord_width: Total width of the canvas\nmajor_breaks: Numbered breaks for sequence axis\nminor_breaks: Ticks for sequence axis\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.internal_message_init!-Tuple{FelNode, Partition}","page":"Home","title":"MolecularEvolution.internal_message_init!","text":"internal_message_init!(tree::FelNode, partition::Partition)\n\nInitializes the message template for each node in the tree, as an array of the partition.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.internal_message_init!-Tuple{FelNode, Vector{<:Partition}}","page":"Home","title":"MolecularEvolution.internal_message_init!","text":"internal_message_init!(tree::FelNode, empty_message::Vector{<:Partition})\n\nInitializes the message template for each node in the tree, allocating space for each partition.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.istreeconsistent-Tuple{T} where T<:AbstractTreeNode","page":"Home","title":"MolecularEvolution.istreeconsistent","text":"istreeconsistent(root)\n\nChecks whether the :parent field is set to be consistent with the :child field for all nodes in the subtree. \n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.lazyprep!-Tuple{FelNode, Vector{<:Partition}}","page":"Home","title":"MolecularEvolution.lazyprep!","text":"lazyprep!(tree::FelNode, initial_message::Vector{<:Partition}; partition_list = 1:length(tree.message), direction::LazyDirection = LazyUp())\n\nExtra, intermediate step of tree preparations between initializing messages across the tree and calling message passing algorithms with LazyPartition.\n\nPerform a lazysort! on tree to obtain the optimal tree for a lazy felsenstein! prop, or a sample_down!.\nFix tree.parent_message to an initial message.\nPreallocate sufficiently many inner partitions needed for a felsenstein! prop, or a sample_down!.\nSpecialized preparations based on the direction of the operations (forward!, backward!). LazyDown or LazyUp.\n\nSee also LazyDown, LazyUp.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.lazysort!-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.lazysort!","text":"Should be run on a tree containing LazyPartitions before running felsenstein!. Sorts for a minimal count of active partitions during a felsenstein!\nReturns the minimum length of memoryblocks (-1) required for a felsenstein! prop. We need a temporary memoryblock during backward!, hence the '-1'.\n\nnote: Note\nSince felsenstein! uses a stack, we want to avoid having long node.children[1].children[1]... chains\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.linear_scale-NTuple{5, Any}","page":"Home","title":"MolecularEvolution.linear_scale","text":"linear_scale(val,in_min,in_max,out_min,out_max)\n\nLinearly maps val which lives in [inmin,inmax] to a value in [outmin,outmax]\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.log_likelihood!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.log_likelihood!","text":"log_likelihood!(tree::FelNode, models; partition_list = nothing)\n\nFirst re-computes the upward felsenstein pass, and then computes the log likelihood of this tree. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.log_likelihood-Tuple{FelNode, BranchModel}","page":"Home","title":"MolecularEvolution.log_likelihood","text":"log_likelihood(tree::FelNode, models; partition_list = nothing)\n\nComputed the log likelihood of this tree. Requires felsenstein!() to have been run. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.longest_path-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.longest_path","text":"Returns the longest path in a tree For convenience, this is returned as two lists of form: [leafnode, parentnode, .... root] Where the leaf_node nodes are selected to be the furthest away\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.marginal_state_dict-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.marginal_state_dict","text":"marginal_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.matrix_for_display-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.matrix_for_display","text":"matrix_for_display(Q,labels)\n\nTakes a numerical matrix and a vector of labels, and returns a typically mixed type matrix with the numerical values and the labels. This is to easily visualize rate matrices in eg. the REPL.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.midpoint-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.midpoint","text":"Returns a midpoint as a node and a distance above it where the midpoint is\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.mix-Union{Tuple{SWMPartition{PType}}, Tuple{PType}} where PType<:DiscretePartition","page":"Home","title":"MolecularEvolution.mix","text":"mix(swm_part::SWMPartition{PType} ) where {PType <: MultiSitePartition}\n\nmix collapses a Site-Wise Mixture partition to a single component partition, weighted by the site-wise likelihoods for each component, and the init weights. Specifically, it takes a SWMPartition{Ptype} and returns a PType. You'll need to have this implemented for certain helper functionality if you're playing with new kinds of SWMPartitions that aren't mixtures of DiscretePartitions.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.name2node_dict-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.name2node_dict","text":"name2node_dict(root)\n\nReturns a dictionary of leaf nodes, indexed by node.name. Can be used to associate sequences with leaf nodes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.newick-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.newick","text":"newick(root)\n\nReturns a newick string representation of the tree.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.nni_optim!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.nni_optim!","text":"nni_optim!(tree::FelNode, models; partition_list = nothing, tol = 1e-5)\n\nConsiders local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models). accrule allows you to specify a function that takes the current and proposed log likelihoods, and if true is returned the move is accepted.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.node_distances-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.node_distances","text":"Compute the distance to all other nodes from a given node\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.nonreversibleQ-Tuple{Any}","page":"Home","title":"MolecularEvolution.nonreversibleQ","text":"nonreversibleQ(param_vec)\n\nTakes a vector of parameters and returns a nonreversible rate matrix.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.parent_list-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.parent_list","text":"Provides a list of parent nodes nodes from this node up to the root node\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.partition2obs-Tuple{DiscretePartition, String}","page":"Home","title":"MolecularEvolution.partition2obs","text":"partition2obs(part::Partition)\n\nExtracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.populate_tree!-Tuple{FelNode, Partition, Any, Any}","page":"Home","title":"MolecularEvolution.populate_tree!","text":"populate_tree!(tree::FelNode, starting_message, names, data; init_all_messages = true, tolerate_missing = 1)\n\nTakes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if\n\ntolerate_missing = 0, an error will be thrown\ntolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)\ntolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.promote_internal-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.promote_internal","text":"promote_internal(tree::FelNode)\n\nCreates a new tree similar to the given tree, but with 'dummy' leaf nodes (w/ zero branchlength) representing each internal node (for drawing / evenly spacing labels internal nodes).\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.quadratic_CI-Tuple{Function, Vector, Int64}","page":"Home","title":"MolecularEvolution.quadratic_CI","text":"quadratic_CI(f::Function,opt_params::Vector, param_ind::Int; rate_conf_level = 0.99, nudge_amount = 0.01)\n\nTakes a NEGATIVE log likelihood function (compatible with Optim.jl), a vector of maximizing parameters, an a parameter index. Returns the quadratic confidence interval.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.quadratic_CI-Tuple{Vector, Vector}","page":"Home","title":"MolecularEvolution.quadratic_CI","text":"quadratic_CI(xvec,yvec; rate_conf_level = 0.99)\n\nTakes xvec, a vector of parameter values, and yvec, a vector of log likelihood evaluations (note: NOT the negative LLs you) might use with Optim.jl. Returns the confidence intervals computed by a quadratic approximation to the LL.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.read_fasta-Tuple{String}","page":"Home","title":"MolecularEvolution.read_fasta","text":"read_fasta(filepath::String)\n\nReads in a fasta file and returns a tuple of (seqnames, seqs).\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.read_newick_tree-Tuple{String}","page":"Home","title":"MolecularEvolution.read_newick_tree","text":"readnewicktree(treefile)\n\nReads in a tree from a file, of type FelNode\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.reversibleQ-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.reversibleQ","text":"reversibleQ(param_vec,eq_freqs)\n\nTakes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.root2tip_distances-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.root2tip_distances","text":"root2tips(root::AbstractTreeNode)\n\nReturns a vector of root-to-tip distances, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sample_down!-Tuple{FelNode, Any, Any}","page":"Home","title":"MolecularEvolution.sample_down!","text":"sampledown!(root::FelNode,models,partitionlist)\n\nGenerates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sample_from_message!-Tuple{Vector{<:Partition}}","page":"Home","title":"MolecularEvolution.sample_from_message!","text":"sample_from_message!(message::Vector{<:Partition})\n\n#Replaces an uncertain message with a sample from the distribution represented by each partition.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.savefig_tweakSVG-Tuple{Any, Context}","page":"Home","title":"MolecularEvolution.savefig_tweakSVG","text":"savefig_tweakSVG(fname, plot::Context; width = 10cm, height = 10cm, linecap_round = true, white_background = true)\n\nSaves a figure created using the Compose approach, but tweaks the SVG after export.\n\neg. savefig_tweakSVG(\"export.svg\",pl)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.savefig_tweakSVG-Tuple{Any, Plots.Plot}","page":"Home","title":"MolecularEvolution.savefig_tweakSVG","text":"savefig_tweakSVG(fname, plot::Plots.Plot; hack_bounding_box = true, new_viewbox = nothing, linecap_round = true)\n\nNote: Might only work if you're using the GR backend!! Saves a figure created using the Phylo Plots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.\n\neg. savefig_tweakSVG(\"export.svg\",pl, new_viewbox = [-100, -100, 3000, 4500])\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.shortest_path_between_nodes-Tuple{FelNode, FelNode}","page":"Home","title":"MolecularEvolution.shortest_path_between_nodes","text":"Shortest path between nodes, returned as two lists, each starting with one of the two nodes, and ending with the common ancestor\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sibling_inds-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.sibling_inds","text":"sibling_inds(node)\n\nReturns logical indices of the siblings in the parent's child's vector.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.siblings-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.siblings","text":"siblings(node)\n\nReturns a vector of siblings of node.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sim_tree-Tuple{Int64, Any, Any}","page":"Home","title":"MolecularEvolution.sim_tree","text":"sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)\n\nSimulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.\n\nNefunc(t) = (sin(t/10)+1)*100.0 + 10.0 root = simtree(600,Nefunc,1.0) simpletree_draw(ladderize(root))\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sim_tree-Tuple{}","page":"Home","title":"MolecularEvolution.sim_tree","text":"sim_tree(;n = 10)\n\nSimulates tree with constant population size.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.simple_radial_tree_plot-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.simple_radial_tree_plot","text":"simple_radial_tree_plot(root::FelNode; canvas_width = 10cm, line_color = \"black\", line_width = 0.1mm)\n\nDraws a radial tree. No frills. No labels. Canvas height is automatically determined to avoid distorting the tree.\n\nnewt = betternewickimport(\"((A:1,B:1,C:1,D:1,E:1,F:1,G:1):1,(H:1,I:1):1);\", FelNode{Float64}); simpleradialtreeplot(newt,linewidth = 0.5mm,root_angle = 7/10)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.simple_tree_draw-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.simple_tree_draw","text":"img = simpletreedraw(tree::FelNode; canvaswidth = 15cm, canvasheight = 15cm, linecolor = \"black\", linewidth = 0.1mm)\n\nA line drawing of a tree with very few options.\n\nimg = simple_tree_draw(tree)\nimg |> SVG(\"imgout.svg\",10cm, 10cm)\nOR\nusing Cairo\nimg |> PDF(\"imgout.pdf\",10cm, 10cm)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.total_LL-Tuple{Partition}","page":"Home","title":"MolecularEvolution.total_LL","text":"total_LL(p::Partition)\n\nIf called on the root, it returns the log likelihood associated with that partition. Can be overloaded for complex partitions without straightforward site log likelihoods.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree2distances-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.tree2distances","text":"tree2distances(root::AbstractTreeNode)\n\nReturns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree2shared_branch_lengths-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.tree2shared_branch_lengths","text":"tree2distances(root::AbstractTreeNode)\n\nReturns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree_draw-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.tree_draw","text":"tree_draw(tree::FelNode;\n canvas_width = 15cm, canvas_height = 15cm,\n stretch_for_labels = 2.0, draw_labels = true,\n line_width = 0.1mm, font_size = 4pt,\n min_dot_size = 0.00, max_dot_size = 0.01,\n line_opacity = 1.0,\n dot_opacity = 1.0,\n name_opacity = 1.0,\n horizontal = true,\n dot_size_dict = Dict(), dot_size_default = 0.0,\n dot_color_dict = Dict(), dot_color_default = \"black\",\n line_color_dict = Dict(), line_color_default = \"black\",\n label_color_dict = Dict(), label_color_default = \"black\",\n nodelabel_dict = Dict(),compose_dict = Dict()\n )\n\nDraws a tree with a number of self-explanatory options. Dictionaries that map a node to a color/size are used to control per-node plotting options. compose_dict must be a FelNode->function(x,y) dictionary that returns a compose() struct.\n\nExample using compose_dict\n\nstr_tree = \"(((((tax24:0.09731668728575642,(tax22:0.08792233964843627,tax18:0.9210388482867483):0.3200367900275155):0.6948314526087965,(tax13:1.9977212308725611,(tax15:0.4290074347886068,(tax17:0.32928401808187824,(tax12:0.3860215462534818,tax16:0.2197134841232339):0.1399122681886174):0.05744611946245004):1.4686085778061146):0.20724159879522402):0.4539334554156126,tax28:0.4885576926440158):0.002162260013924424,tax26:0.9451873777301325):3.8695419798779387,((tax29:0.10062813251515536,tax27:0.27653633028085006):0.04262434258357507,(tax25:0.009345653929737636,((tax23:0.015832941547076644,(tax20:0.5550597590956172,((tax8:0.6649025646927402,tax9:0.358506423199849):0.1439516404012261,tax11:0.01995439013213013):1.155181296134081):0.17930021667907567):0.10906638146207207,((((((tax6:0.013708993438720255,tax5:0.061144001556547097):0.1395453591567641,tax3:0.4713722705245479):0.07432598428904214,tax1:0.5993347898257291):1.0588025698844894,(tax10:0.13109032492533992,(tax4:0.8517302241963356,(tax2:0.8481963081549965,tax7:0.23754095940676642):0.2394313086297733):0.43596704123297675):0.08774657269409454):0.9345533723114966,(tax14:0.7089558245245173,tax19:0.444897137240675):0.08657675809803095):0.01632062723968511,tax21:0.029535281963725537):0.49502691718938285):0.25829576024240986):0.7339777396780424):4.148878039524972):0.0\"\nnewt = gettreefromnewick(str_tree, FelNode)\nladderize!(newt)\ncompose_dict = Dict()\nfor n in getleaflist(newt)\n #Replace the rand(4) with the frequencies you actually want.\n compose_dict[n] = (x,y)->pie_chart(x,y,MolecularEvolution.sum2one(rand(4)),size = 0.03)\nend\ntree_draw(newt,draw_labels = false,line_width = 0.5mm, compose_dict = compose_dict)\n\n\nimg = tree_draw(tree)\nimg |> SVG(\"imgout.svg\",10cm, 10cm)\nOR\nusing Cairo\nimg |> PDF(\"imgout.pdf\",10cm, 10cm)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree_polish!-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.tree_polish!","text":"tree_polish!(newt, models; tol = 10^-4, verbose = 1, topology = true)\n\nTakes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.unc2probvec-Tuple{Any}","page":"Home","title":"MolecularEvolution.unc2probvec","text":"unc2probvec(v)\n\nTakes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.univariate_maximize-Tuple{Any, Real, Real, Any, BrentsMethodOpt, Real}","page":"Home","title":"MolecularEvolution.univariate_maximize","text":"univariate_maximize(f, a::Real, b::Real, transform, optimizer::BrentsMethodOpt, t::Real; ε::Real=sqrt(eps))\n\nMaximizes f(x) using Brent's method. See ?brents_method_minimize.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.univariate_maximize-Tuple{Any, Real, Real, Any, GoldenSectionOpt, Real}","page":"Home","title":"MolecularEvolution.univariate_maximize","text":"univariate_maximize(f, a::Real, b::Real, transform, optimizer::GoldenSectionOpt, tol::Real)\n\nMaximizes f(x) using a Golden Section Search. See ?golden_section_maximize.\n\nExamples\n\njulia> f(x) = -(x-2)^2\nf (generic function with 1 method)\n\njulia> m = univariate_maximize(f, 1, 5, identity, GoldenSectionOpt(), 1e-10)\n2.0000000000051843\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.values_from_phylo_tree-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.values_from_phylo_tree","text":"values_from_phylo_tree(phylo_tree, key)\n\nReturns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.weightEM-Tuple{Matrix{Float64}, Any}","page":"Home","title":"MolecularEvolution.weightEM","text":"weightEM(con_lik_matrix::Array{Float64,2}, θ; conc = 0.0, iters = 500)\n\nTakes a conditional likelihood matrix (#categories-by-sites) and a starting frequency vector θ (length(θ) = #categories) and optimizes θ (using Expectation Maximization. Maybe.). If conc > 0 then this gives something like variational bayes behavior for LDA. Maybe.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.write_fasta-Tuple{String, Vector{String}}","page":"Home","title":"MolecularEvolution.write_fasta","text":"write_fasta(filepath::String, sequences::Vector{String}; seq_names = nothing)\n\nWrites a fasta file from a vector of sequences, with optional seq_names.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.write_nexus-Tuple{String, FelNode}","page":"Home","title":"MolecularEvolution.write_nexus","text":"write_nexus(fname::String,tree::FelNode)\n\nWrites the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.\n\n\n\n\n\n","category":"method"},{"location":"IO/#Input/Output","page":"Input/Output","title":"Input/Output","text":"","category":"section"},{"location":"IO/","page":"Input/Output","title":"Input/Output","text":"write_nexus\nnewick\nread_newick_tree\npopulate_tree!\nread_fasta\nwrite_fasta","category":"page"},{"location":"IO/#MolecularEvolution.write_nexus","page":"Input/Output","title":"MolecularEvolution.write_nexus","text":"write_nexus(fname::String,tree::FelNode)\n\nWrites the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.newick","page":"Input/Output","title":"MolecularEvolution.newick","text":"newick(root)\n\nReturns a newick string representation of the tree.\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.read_newick_tree","page":"Input/Output","title":"MolecularEvolution.read_newick_tree","text":"readnewicktree(treefile)\n\nReads in a tree from a file, of type FelNode\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.populate_tree!","page":"Input/Output","title":"MolecularEvolution.populate_tree!","text":"populate_tree!(tree::FelNode, starting_message, names, data; init_all_messages = true, tolerate_missing = 1)\n\nTakes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if\n\ntolerate_missing = 0, an error will be thrown\ntolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)\ntolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.read_fasta","page":"Input/Output","title":"MolecularEvolution.read_fasta","text":"read_fasta(filepath::String)\n\nReads in a fasta file and returns a tuple of (seqnames, seqs).\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.write_fasta","page":"Input/Output","title":"MolecularEvolution.write_fasta","text":"write_fasta(filepath::String, sequences::Vector{String}; seq_names = nothing)\n\nWrites a fasta file from a vector of sequences, with optional seq_names.\n\n\n\n\n\n","category":"function"}] } diff --git a/dev/simulation/index.html b/dev/simulation/index.html index 988406a..8a98606 100644 --- a/dev/simulation/index.html +++ b/dev/simulation/index.html @@ -58,4 +58,4 @@ df.names = [n.name for n in getleaflist(tree)] df.seqs = [partition2obs(n.message[1]) for n in getleaflist(tree)] df.mu = [partition2obs(n.message[2]) for n in getleaflist(tree)] -CSV.write("flu_sim_seq_and_bm.csv",df)

Or we could export just the sequences as .fasta

write_fasta("flu_sim_seq_and_bm.fasta",df.seqs,seq_names = df.names)

Which will look something like this, when opened in AliView

Functions

MolecularEvolution.sim_treeFunction
sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)

Simulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.

Nefunc(t) = (sin(t/10)+1)*100.0 + 10.0 root = simtree(600,Nefunc,1.0) simpletree_draw(ladderize(root))

source
sim_tree(;n = 10)

Simulates tree with constant population size.

source
MolecularEvolution.sample_down!Function

sampledown!(root::FelNode,models,partitionlist)

Generates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.partition2obsFunction
partition2obs(part::Partition)

Extracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.

source
+CSV.write("flu_sim_seq_and_bm.csv",df)

Or we could export just the sequences as .fasta

write_fasta("flu_sim_seq_and_bm.fasta",df.seqs,seq_names = df.names)

Which will look something like this, when opened in AliView

Functions

MolecularEvolution.sim_treeFunction
sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)

Simulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.

Nefunc(t) = (sin(t/10)+1)*100.0 + 10.0 root = simtree(600,Nefunc,1.0) simpletree_draw(ladderize(root))

source
sim_tree(;n = 10)

Simulates tree with constant population size.

source
MolecularEvolution.sample_down!Function

sampledown!(root::FelNode,models,partitionlist)

Generates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.

source
MolecularEvolution.partition2obsFunction
partition2obs(part::Partition)

Extracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.

source
diff --git a/dev/viz/index.html b/dev/viz/index.html index d753c69..b8c3ac1 100644 --- a/dev/viz/index.html +++ b/dev/viz/index.html @@ -30,9 +30,9 @@ for n in getnodelist(tree) compose_dict[n] = (x,y)->pie_chart(x,y,d[n][1].state[:,1],size = 0.02, opacity = 0.75) end -img = tree_draw(tree,draw_labels = false, line_width = 0.5mm, compose_dict = compose_dict)

This can then be exported with:

savefig_tweakSVG("piechart_tree.svg",img)

Functions

MolecularEvolution.get_phylo_treeFunction
get_phylo_tree(molev_root::FelNode; data_function = (x -> Tuple{String,Float64}[]))

Converts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.

source
MolecularEvolution.values_from_phylo_treeFunction
values_from_phylo_tree(phylo_tree, key)
+img = tree_draw(tree,draw_labels = false, line_width = 0.5mm, compose_dict = compose_dict)

This can then be exported with:

savefig_tweakSVG("piechart_tree.svg",img)

Functions

MolecularEvolution.get_phylo_treeFunction
get_phylo_tree(molev_root::FelNode; data_function = (x -> Tuple{String,Float64}[]))

Converts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.

source
MolecularEvolution.values_from_phylo_treeFunction
values_from_phylo_tree(phylo_tree, key)
 
-Returns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.
source
MolecularEvolution.savefig_tweakSVGFunction
savefig_tweakSVG(fname, plot::Plots.Plot; hack_bounding_box = true, new_viewbox = nothing, linecap_round = true)

Note: Might only work if you're using the GR backend!! Saves a figure created using the Phylo Plots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.

eg. savefig_tweakSVG("export.svg",pl, new_viewbox = [-100, -100, 3000, 4500])

source
savefig_tweakSVG(fname, plot::Context; width = 10cm, height = 10cm, linecap_round = true, white_background = true)

Saves a figure created using the Compose approach, but tweaks the SVG after export.

eg. savefig_tweakSVG("export.svg",pl)

source
MolecularEvolution.tree_drawFunction
tree_draw(tree::FelNode;
+Returns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.
source
MolecularEvolution.savefig_tweakSVGFunction
savefig_tweakSVG(fname, plot::Plots.Plot; hack_bounding_box = true, new_viewbox = nothing, linecap_round = true)

Note: Might only work if you're using the GR backend!! Saves a figure created using the Phylo Plots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.

eg. savefig_tweakSVG("export.svg",pl, new_viewbox = [-100, -100, 3000, 4500])

source
savefig_tweakSVG(fname, plot::Context; width = 10cm, height = 10cm, linecap_round = true, white_background = true)

Saves a figure created using the Compose approach, but tweaks the SVG after export.

eg. savefig_tweakSVG("export.svg",pl)

source
MolecularEvolution.tree_drawFunction
tree_draw(tree::FelNode;
     canvas_width = 15cm, canvas_height = 15cm,
     stretch_for_labels = 2.0, draw_labels = true,
     line_width = 0.1mm, font_size = 4pt,
@@ -61,4 +61,4 @@
 img |> SVG("imgout.svg",10cm, 10cm)
 OR
 using Cairo
-img |> PDF("imgout.pdf",10cm, 10cm)
source
+img |> PDF("imgout.pdf",10cm, 10cm)source