diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json
index db9d9a3..6cab848 100644
--- a/dev/.documenter-siteinfo.json
+++ b/dev/.documenter-siteinfo.json
@@ -1 +1 @@
-{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-09-01T18:50:41","documenter_version":"1.6.0"}}
\ No newline at end of file
+{"documenter":{"julia_version":"1.11.1","generation_timestamp":"2024-11-22T10:57:00","documenter_version":"1.8.0"}}
\ No newline at end of file
diff --git a/dev/IO/index.html b/dev/IO/index.html
index 5b7fb7a..0777ca1 100644
--- a/dev/IO/index.html
+++ b/dev/IO/index.html
@@ -1,2 +1,2 @@
-
Writes the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.
Takes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if
tolerate_missing = 0, an error will be thrown
tolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)
tolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)
A renaming function that can eg. strip tags from the tree when matching leaf names with names can be passed to leaf_name_transform
Writes the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.
Takes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if
tolerate_missing = 0, an error will be thrown
tolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)
tolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)
A renaming function that can eg. strip tags from the tree when matching leaf names with names can be passed to leaf_name_transform
Given a phylogeny, and observations on some set of leaf nodes, "ancestral reconstruction" describes a family of approaches for inferring the state of the ancestors, or the distribution over possible states of ancestors.
Given a phylogeny, and observations on some set of leaf nodes, "ancestral reconstruction" describes a family of approaches for inferring the state of the ancestors, or the distribution over possible states of ancestors.
Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.
Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.
Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.
Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.
Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.
Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.
This document was generated with Documenter.jl version 1.8.0 on Friday 22 November 2024. Using Julia version 1.11.1.
diff --git a/dev/assets/documenter.js b/dev/assets/documenter.js
index 82252a1..7d68cd8 100644
--- a/dev/assets/documenter.js
+++ b/dev/assets/documenter.js
@@ -612,176 +612,194 @@ function worker_function(documenterSearchIndex, documenterBaseURL, filters) {
};
}
-// `worker = Threads.@spawn worker_function(documenterSearchIndex)`, but in JavaScript!
-const filters = [
- ...new Set(documenterSearchIndex["docs"].map((x) => x.category)),
-];
-const worker_str =
- "(" +
- worker_function.toString() +
- ")(" +
- JSON.stringify(documenterSearchIndex["docs"]) +
- "," +
- JSON.stringify(documenterBaseURL) +
- "," +
- JSON.stringify(filters) +
- ")";
-const worker_blob = new Blob([worker_str], { type: "text/javascript" });
-const worker = new Worker(URL.createObjectURL(worker_blob));
-
/////// SEARCH MAIN ///////
-// Whether the worker is currently handling a search. This is a boolean
-// as the worker only ever handles 1 or 0 searches at a time.
-var worker_is_running = false;
-
-// The last search text that was sent to the worker. This is used to determine
-// if the worker should be launched again when it reports back results.
-var last_search_text = "";
-
-// The results of the last search. This, in combination with the state of the filters
-// in the DOM, is used compute the results to display on calls to update_search.
-var unfiltered_results = [];
-
-// Which filter is currently selected
-var selected_filter = "";
-
-$(document).on("input", ".documenter-search-input", function (event) {
- if (!worker_is_running) {
- launch_search();
- }
-});
-
-function launch_search() {
- worker_is_running = true;
- last_search_text = $(".documenter-search-input").val();
- worker.postMessage(last_search_text);
-}
-
-worker.onmessage = function (e) {
- if (last_search_text !== $(".documenter-search-input").val()) {
- launch_search();
- } else {
- worker_is_running = false;
- }
-
- unfiltered_results = e.data;
- update_search();
-};
+function runSearchMainCode() {
+ // `worker = Threads.@spawn worker_function(documenterSearchIndex)`, but in JavaScript!
+ const filters = [
+ ...new Set(documenterSearchIndex["docs"].map((x) => x.category)),
+ ];
+ const worker_str =
+ "(" +
+ worker_function.toString() +
+ ")(" +
+ JSON.stringify(documenterSearchIndex["docs"]) +
+ "," +
+ JSON.stringify(documenterBaseURL) +
+ "," +
+ JSON.stringify(filters) +
+ ")";
+ const worker_blob = new Blob([worker_str], { type: "text/javascript" });
+ const worker = new Worker(URL.createObjectURL(worker_blob));
+
+ // Whether the worker is currently handling a search. This is a boolean
+ // as the worker only ever handles 1 or 0 searches at a time.
+ var worker_is_running = false;
+
+ // The last search text that was sent to the worker. This is used to determine
+ // if the worker should be launched again when it reports back results.
+ var last_search_text = "";
+
+ // The results of the last search. This, in combination with the state of the filters
+ // in the DOM, is used compute the results to display on calls to update_search.
+ var unfiltered_results = [];
+
+ // Which filter is currently selected
+ var selected_filter = "";
+
+ $(document).on("input", ".documenter-search-input", function (event) {
+ if (!worker_is_running) {
+ launch_search();
+ }
+ });
-$(document).on("click", ".search-filter", function () {
- if ($(this).hasClass("search-filter-selected")) {
- selected_filter = "";
- } else {
- selected_filter = $(this).text().toLowerCase();
+ function launch_search() {
+ worker_is_running = true;
+ last_search_text = $(".documenter-search-input").val();
+ worker.postMessage(last_search_text);
}
- // This updates search results and toggles classes for UI:
- update_search();
-});
+ worker.onmessage = function (e) {
+ if (last_search_text !== $(".documenter-search-input").val()) {
+ launch_search();
+ } else {
+ worker_is_running = false;
+ }
-/**
- * Make/Update the search component
- */
-function update_search() {
- let querystring = $(".documenter-search-input").val();
+ unfiltered_results = e.data;
+ update_search();
+ };
- if (querystring.trim()) {
- if (selected_filter == "") {
- results = unfiltered_results;
+ $(document).on("click", ".search-filter", function () {
+ if ($(this).hasClass("search-filter-selected")) {
+ selected_filter = "";
} else {
- results = unfiltered_results.filter((result) => {
- return selected_filter == result.category.toLowerCase();
- });
+ selected_filter = $(this).text().toLowerCase();
}
- let search_result_container = ``;
- let modal_filters = make_modal_body_filters();
- let search_divider = ``;
+ // This updates search results and toggles classes for UI:
+ update_search();
+ });
- if (results.length) {
- let links = [];
- let count = 0;
- let search_results = "";
-
- for (var i = 0, n = results.length; i < n && count < 200; ++i) {
- let result = results[i];
- if (result.location && !links.includes(result.location)) {
- search_results += result.div;
- count++;
- links.push(result.location);
- }
- }
+ /**
+ * Make/Update the search component
+ */
+ function update_search() {
+ let querystring = $(".documenter-search-input").val();
- if (count == 1) {
- count_str = "1 result";
- } else if (count == 200) {
- count_str = "200+ results";
+ if (querystring.trim()) {
+ if (selected_filter == "") {
+ results = unfiltered_results;
} else {
- count_str = count + " results";
+ results = unfiltered_results.filter((result) => {
+ return selected_filter == result.category.toLowerCase();
+ });
}
- let result_count = `
${count_str}
`;
- search_result_container = `
+ let search_result_container = ``;
+ let modal_filters = make_modal_body_filters();
+ let search_divider = ``;
+
+ if (results.length) {
+ let links = [];
+ let count = 0;
+ let search_results = "";
+
+ for (var i = 0, n = results.length; i < n && count < 200; ++i) {
+ let result = results[i];
+ if (result.location && !links.includes(result.location)) {
+ search_results += result.div;
+ count++;
+ links.push(result.location);
+ }
+ }
+
+ if (count == 1) {
+ count_str = "1 result";
+ } else if (count == 200) {
+ count_str = "200+ results";
+ } else {
+ count_str = count + " results";
+ }
+ let result_count = `
This example reads amino acid sequences from this FASTA file, and a phylogeny from this Newick tree file. A WAG amino acid model, augmented to explicitly model gap (ie. '-') characters, and a global substitution rate is estimated by maximum likelihood. Under this optimized model, the distribution over ancestral amino acids is constructed for each node, and visualized in multiple ways.
using MolecularEvolution, FASTX, Phylo, Plots
+Examples · MolecularEvolution.jl
This example reads amino acid sequences from this FASTA file, and a phylogeny from this Newick tree file. A WAG amino acid model, augmented to explicitly model gap (ie. '-') characters, and a global substitution rate is estimated by maximum likelihood. Under this optimized model, the distribution over ancestral amino acids is constructed for each node, and visualized in multiple ways.
using MolecularEvolution, FASTX, Phylo, Plots
#Read in seqs and tree
seqnames, seqs = read_fasta("Data/MusAA_IGHV.fasta")
@@ -371,4 +371,4 @@
end
end
Site 153: P(β>α)=0.9074
Site 158: P(β>α)=0.9266
-Site 160: P(β>α)=0.9547
The organizing principle is that the core algorithms, including Felsenstein's algorithm, but also a related family of message passing algorithms and inference machinery, are implemented in a way that does not refer to any specific model or even to any particular data type.
A Partition is a probabilistic representation of some kind of state. Specifically, it needs to be able to represent P(obs|state) and P(obs,state) when considered as functions of state. So it will typically be able to assign a probability to any possible value of state, and is unnormalized - not required to sum or integrate to 1 over all values of state. As an example, for a discrete state with 4 categories, this could just be a vector of 4 numbers.
For a Partition type to be usable by MolecularEvolution.jl, the combine! function needs to be implemented. If you have P(obsA|state) and P(obsB|state), then combine! calculates P(obsA,obsB|state) under the assumption that obsA and obsB are conditionally independent given state. MolecularEvolution.jl tries to avoid allocating memory, so combine!(dest,src) places in dest the combined Partition in dest. For a discrete state with 4 categories, this is simply element-wise multiplication of two state vectors.
A BranchModel defines how Partition distributions evolve along branches. Two functions need to be implemented: backward! and forward!. We imagine our trees with the root at the top, and forward! moves from root to tip, and backward! moves from tip to root. backward!(dest::P,src::P,m::BranchModel,n::FelNode) takes a src Partition, representing P(obs-below|state-at-bottom-of-branch), and modifies the dest Partition to be P(obs-below|state-at-top-of-branch), where the branch in question is the branch above the FelNode n. forward! goes in the opposite direction, from P(obs-above,state-at-top-of-branch) to P(obs-above,state-at-bottom-of-branch), with the Partitions now, confusingly, representing joint distributions.
Nodes on our trees work with messages, where a message is a vector of Partition structs. This is in case you wish to model multiple different data types on the same tree. Often, all the messages on the tree will just be arrays containing a single Partition, but if you're accessing them you need to remember that they're in an array!
Each node in our tree is a FelNode ("Fel" for "Felsenstein"). They point to their parent nodes, and an array of their children, and they store their main vector of Partitions, but also cached versions of those from their parents and children, to allow certain message passing schemes. They also have a branchlength field, which tells eg. forward! and backward! how much evolution occurs along the branch above (ie. closer to the root) that node. They also allow for an arbitrary dictionary of node_data, in case a model needs any other branch-specific parameters.
The set of algorithms needs to know which model to use for which partition, so the assumption made is that they'll see an array of models whose order will match the partition array. In general, we might want the models to vary from one branch to another, so the central algorithms take a function that associates a FelNode->Vector{:<BranchModel}. In the simpler cases where the model does not vary from branch to branch, or where there is only a single Partition, and thus a single model, the core algorithms have been overloaded to allow you to pass in a single model vector or a single model.
Felsenstein's algorithm recursively computes, for each node, the probability of all observations below that node, given the state at that node. Felsenstein's algorithm can be decomposed into the following combination of backward! and combine! operations:
At the root node, we wind up with $P(O_{all}|R)$, where $R$ is the state at the root, and we can compute $P(O_{all}) = \sum_{R} P(O_{all}|R) P(R)$.
Propagate the source partition forwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.
Propagate the source partition backwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.
The organizing principle is that the core algorithms, including Felsenstein's algorithm, but also a related family of message passing algorithms and inference machinery, are implemented in a way that does not refer to any specific model or even to any particular data type.
A Partition is a probabilistic representation of some kind of state. Specifically, it needs to be able to represent P(obs|state) and P(obs,state) when considered as functions of state. So it will typically be able to assign a probability to any possible value of state, and is unnormalized - not required to sum or integrate to 1 over all values of state. As an example, for a discrete state with 4 categories, this could just be a vector of 4 numbers.
For a Partition type to be usable by MolecularEvolution.jl, the combine! function needs to be implemented. If you have P(obsA|state) and P(obsB|state), then combine! calculates P(obsA,obsB|state) under the assumption that obsA and obsB are conditionally independent given state. MolecularEvolution.jl tries to avoid allocating memory, so combine!(dest,src) places in dest the combined Partition in dest. For a discrete state with 4 categories, this is simply element-wise multiplication of two state vectors.
A BranchModel defines how Partition distributions evolve along branches. Two functions need to be implemented: backward! and forward!. We imagine our trees with the root at the top, and forward! moves from root to tip, and backward! moves from tip to root. backward!(dest::P,src::P,m::BranchModel,n::FelNode) takes a src Partition, representing P(obs-below|state-at-bottom-of-branch), and modifies the dest Partition to be P(obs-below|state-at-top-of-branch), where the branch in question is the branch above the FelNode n. forward! goes in the opposite direction, from P(obs-above,state-at-top-of-branch) to P(obs-above,state-at-bottom-of-branch), with the Partitions now, confusingly, representing joint distributions.
Nodes on our trees work with messages, where a message is a vector of Partition structs. This is in case you wish to model multiple different data types on the same tree. Often, all the messages on the tree will just be arrays containing a single Partition, but if you're accessing them you need to remember that they're in an array!
Each node in our tree is a FelNode ("Fel" for "Felsenstein"). They point to their parent nodes, and an array of their children, and they store their main vector of Partitions, but also cached versions of those from their parents and children, to allow certain message passing schemes. They also have a branchlength field, which tells eg. forward! and backward! how much evolution occurs along the branch above (ie. closer to the root) that node. They also allow for an arbitrary dictionary of node_data, in case a model needs any other branch-specific parameters.
The set of algorithms needs to know which model to use for which partition, so the assumption made is that they'll see an array of models whose order will match the partition array. In general, we might want the models to vary from one branch to another, so the central algorithms take a function that associates a FelNode->Vector{:<BranchModel}. In the simpler cases where the model does not vary from branch to branch, or where there is only a single Partition, and thus a single model, the core algorithms have been overloaded to allow you to pass in a single model vector or a single model.
Felsenstein's algorithm recursively computes, for each node, the probability of all observations below that node, given the state at that node. Felsenstein's algorithm can be decomposed into the following combination of backward! and combine! operations:
At the root node, we wind up with $P(O_{all}|R)$, where $R$ is the state at the root, and we can compute $P(O_{all}) = \sum_{R} P(O_{all}|R) P(R)$.
Propagate the source partition forwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.
Propagate the source partition backwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.
We offer two routes to visualization. The first is using our own plotting routines, built atop Compose.jl. The second converts our trees to Phylo.jl trees, and plots with their Plots.jl recipes. The Compose, Plots, and Phylo dependencies are optional.
using MolecularEvolution, Plots, Phylo
+
+#First simulate a tree, and then Brownian motion:
+tree = sim_tree(n = 20)
+internal_message_init!(tree, GaussianPartition())
+bm_model = BrownianMotion(0.0, 0.1)
+sample_down!(tree, bm_model)
+
+#We'll add the Gaussian means to the node_data dictionaries
+for n in getnodelist(tree)
+ n.node_data = Dict(["mu" => n.message[1].mean])
+end
+
+#Transducing the mol ev tree to a Phylo.jl tree
+phylo_tree = get_phylo_tree(tree)
+
+pl = plot(
+ phylo_tree,
+ showtips = true,
+ tipfont = 6,
+ marker_z = "mu",
+ markeralpha = 0.5,
+ line_z = "mu",
+ linecolor = :darkrainbow,
+ markersize = 4.0,
+ markerstrokewidth = 0,
+ margins = 1Plots.cm,
+ linewidth = 1.5,
+ markercolor = :darkrainbow,
+ size = (500, 500),
+)
We also offer savefig_tweakSVG("simple_plot_example.svg", pl) for some post-processing tricks that improve the exported trees, like rounding line caps, and values_from_phylo_tree(phylo_tree,"mu") which can extract stored quantities in the right order for passing into eg. markersize options when plotting.
For a more comprehensive list of things you can do with Phylo.jl plots, please see their documentation.
The Compose.jl in-house tree drawing offers extensive flexibility. Here is an example that plots a pie chart representing the marginal probability of each of the 4 possible nucleotides on all nodes on the tree:
Doesn't require Phylo.jl. Query trees can be plotted against a reference tree with plot_multiple_trees. This can be useful, for instance, when we've sampled trees with metropolis_sample.
using MolecularEvolution, Plots
+
+tree = sim_tree(10, 1, 1)
+nodelist = getnodelist(tree); mean = sum([n.branchlength for n in nodelist]) / length(nodelist)
+rparams(n::Int) = MolecularEvolution.sum2one(rand(n))
+model = DiagonalizedCTMC(reversibleQ(ones(6) ./ (6 * mean), rparams(4)))
+internal_message_init!(tree, NucleotidePartition(ones(4) ./ 4, 100))
+sample_down!(tree, model)
+@time trees, LLs = metropolis_sample(tree, [model], 300, collect_LLs=true);
+reference = trees[argmax(LLs)];
7.185392 seconds (22.06 M allocations: 1.569 GiB, 3.15% gc time, 51.89% compilation time)
We'll use the maximum a posteriori tree as reference
plot_multiple_trees(trees, reference)
We can pass in a weight function to fit query trees against reference in a weighted least squares fashion with a location and scale parameter.
Note
If we don't want to scale the query trees, we must disable it with opt_scale = false.
Converts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.
values_from_phylo_tree(phylo_tree, key)
+
+Returns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.
Note: Might only work if you're using the GR backend!! Saves a figure created using the PhyloPlots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.
Draws a tree with a number of self-explanatory options. Dictionaries that map a node to a color/size are used to control per-node plotting options. compose_dict must be a FelNode->function(x,y) dictionary that returns a compose() struct.
Example using compose_dict
str_tree = "(((((tax24:0.09731668728575642,(tax22:0.08792233964843627,tax18:0.9210388482867483):0.3200367900275155):0.6948314526087965,(tax13:1.9977212308725611,(tax15:0.4290074347886068,(tax17:0.32928401808187824,(tax12:0.3860215462534818,tax16:0.2197134841232339):0.1399122681886174):0.05744611946245004):1.4686085778061146):0.20724159879522402):0.4539334554156126,tax28:0.4885576926440158):0.002162260013924424,tax26:0.9451873777301325):3.8695419798779387,((tax29:0.10062813251515536,tax27:0.27653633028085006):0.04262434258357507,(tax25:0.009345653929737636,((tax23:0.015832941547076644,(tax20:0.5550597590956172,((tax8:0.6649025646927402,tax9:0.358506423199849):0.1439516404012261,tax11:0.01995439013213013):1.155181296134081):0.17930021667907567):0.10906638146207207,((((((tax6:0.013708993438720255,tax5:0.061144001556547097):0.1395453591567641,tax3:0.4713722705245479):0.07432598428904214,tax1:0.5993347898257291):1.0588025698844894,(tax10:0.13109032492533992,(tax4:0.8517302241963356,(tax2:0.8481963081549965,tax7:0.23754095940676642):0.2394313086297733):0.43596704123297675):0.08774657269409454):0.9345533723114966,(tax14:0.7089558245245173,tax19:0.444897137240675):0.08657675809803095):0.01632062723968511,tax21:0.029535281963725537):0.49502691718938285):0.25829576024240986):0.7339777396780424):4.148878039524972):0.0"
+newt = gettreefromnewick(str_tree, FelNode)
+ladderize!(newt)
+compose_dict = Dict()
+for n in getleaflist(newt)
+ #Replace the rand(4) with the frequencies you actually want.
+ compose_dict[n] = (x,y)->pie_chart(x,y,MolecularEvolution.sum2one(rand(4)),size = 0.03)
+end
+tree_draw(newt,draw_labels = false,line_width = 0.5mm, compose_dict = compose_dict)
+
+
+img = tree_draw(tree)
+img |> SVG("imgout.svg",10cm, 10cm)
+OR
+using Cairo
+img |> PDF("imgout.pdf",10cm, 10cm)
Plots multiple phylogenetic trees against a reference tree, inf_tree. For each tree in trees, a linear Weighted Least Squares (WLS) problem (parameterized by the weight_fn keyword) is solved for the x-positions of the matching nodes between inf_tree and tree.
Keyword Arguments
node_size=4: the size of the nodes in the plot.
line_width=0.5: the width of the branches from trees.
font_size=10: the font size for the leaf labels.
margin=1.5: the margin between a leaf node and its label.
line_alpha=0.05: the transparency level of the branches from trees.
y_jitter=0.0: the standard deviation of the noise in the y-coordinate.
weight_fn=n::FelNode -> ifelse(isroot(n), 1.0, 0.0)): a function that assigns a weight to a node for the WLS problem.
opt_scale=true: whether to include a scaling parameter for the WLS problem.
MolecularEvolution.jl exploits Julia's multiple dispatch, implementing a fully generic suite of likelihood calculations, branchlength optimization, topology optimization, and ancestral inference. Users can construct trees using already-defined data types and models. But users can define probability distributions over their own data types, and specify the behavior of these under their own model types, and can mix and match different models on the same phylogeny.
If the behavior you need is not already available in MolecularEvolution.jl:
If you have a new data type:
A Partition type that represents the uncertainty over your state.
combine!() that merges evidence from two Partitions.
If you have a new model:
A BranchModel type that stores your model parameters.
forward!() that evolves state distributions over branches, in the root-to-tip direction.
backward!() that reverse-evolves state distributions over branches, in the tip-to-root direction.
And then sampling, likelihood calculations, branch-length optimization, ancestral reconstruction, etc should be available for your new data or model.
Where possible, we avoid design decisions that limit the development of new models, or make it harder to develop new models.
We do not sacrifice flexibility for performance.
Scalability
Analyses implemented using MolecularEvolution.jl should scale to large, real-world datasets.
Performance
While the above take precedence over speed, it should be possible to optimize your Partition, combine!(), BranchModel, forward!() and backward!() functions to obtain competative runtimes.
MolecularEvolution.jl exploits Julia's multiple dispatch, implementing a fully generic suite of likelihood calculations, branchlength optimization, topology optimization, and ancestral inference. Users can construct trees using already-defined data types and models. But users can define probability distributions over their own data types, and specify the behavior of these under their own model types, and can mix and match different models on the same phylogeny.
If the behavior you need is not already available in MolecularEvolution.jl:
If you have a new data type:
A Partition type that represents the uncertainty over your state.
combine!() that merges evidence from two Partitions.
If you have a new model:
A BranchModel type that stores your model parameters.
forward!() that evolves state distributions over branches, in the root-to-tip direction.
backward!() that reverse-evolves state distributions over branches, in the tip-to-root direction.
And then sampling, likelihood calculations, branch-length optimization, ancestral reconstruction, etc should be available for your new data or model.
Where possible, we avoid design decisions that limit the development of new models, or make it harder to develop new models.
We do not sacrifice flexibility for performance.
Scalability
Analyses implemented using MolecularEvolution.jl should scale to large, real-world datasets.
Performance
While the above take precedence over speed, it should be possible to optimize your Partition, combine!(), BranchModel, forward!() and backward!() functions to obtain competative runtimes.
using MolecularEvolution, Plots
#First simulate a tree, using a coalescent process
tree = sim_tree(n=200)
@@ -9,31 +9,31 @@
sample_down!(tree, bm_model)
#And plot the log likelihood as a function of the parameter value
ll(x) = log_likelihood!(tree,BrownianMotion(0.0,x))
-plot(0.7:0.001:1.6,ll, xlabel = "variance per unit time", ylabel = "log likelihood")
BranchlengthSampler
-A type that allows you to specify a additive proposal function in the log domain and a prior distrubution over the log of the branchlengths. It also holds the acceptance ratio acc_ratio (acc_ratio[1] stores the number of accepts, and acc_ratio[2] stores the number of rejects).
Indicate that we want to do a downward pass, e.g. sample_down!. The function passed to the constructor takes a node::FelNode as input and returns a Bool that decides if node stores its observations.
Initialize an empty LazyPartition that is meant for wrapping a partition of type PType.
Description
With this data structure, you can wrap a partition of choice. The idea is that in some message passing algorithms, there is only a wave of partitions which need to actualize. For instance, a wave following a root-leaf path, or a depth-first traversal. In which case, we can be more economical with our memory consumption. With a worst case memory complexity of O(log(n)), where n is the number of nodes, functionality is provided for:
log_likelihood!
felsenstein!
sample_down!
Note
For successive felsenstein! calls, we need to extract the information at the root somehow after each call. This can be done with e.g. total_LL or site_LLs.
Further requirements
Suppose you want to wrap a partition of PType with LazyPartition:
If you're calling log_likelihood! and felsenstein!:
obs2partition!(partition::PType, obs) that transforms an observation to a partition.
If you're calling sample_down!:
partition2obs(partition::PType) that returns the most likely state from a partition, inverts obs2partition!.
Propagate the source partition backwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.
Performs a BFS map-reduce over the tree, starting at a given node For each node, mapreduce is called as: mapreduce(currnode::FelNode, prevnode::FelNode, aggregator) where prev_node is the previous node visited on the path from the start node to the current node It is expected to update the aggregator, and not return anything.
Not exactly conventional map-reduce, as map-reduce calls may rely on state in the aggregator added by map-reduce calls on other nodes visited earlier.
Uses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.
Keyword Arguments
partition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models, the default option).
tol=1e-5: absolute tolerance for the bl_modifier.
bl_modifier=GoldenSectionOpt(): can either be a optimizer or a sampler (subtype of UnivariateModifier). For optimization, in addition to golden section search, Brent's method can be used by setting bl_modifier=BrentsMethodOpt().
sort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.
traversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.
shuffle=false: do a randomly shuffled traversal, overrides traversal.
Given a function f with a single local minimum in the interval (a,b), Brent's method returns an approximation of the x-value that minimizes f to an accuaracy between 2tol and 3tol, where tol is a combination of a relative and an absolute tolerance, tol := ε|x| + t. ε should be no smaller 2*eps, and preferably not much less than sqrt(eps), which is also the default value. eps is defined here as the machine epsilon in double precision. t should be positive.
The method combines the stability of a Golden Section Search and the superlinear convergence Successive Parabolic Interpolation has under certain conditions. The method never converges much slower than a Fibonacci search and for a sufficiently well-behaved f, convergence can be exptected to be superlinear, with an order that's usually atleast 1.3247...
Examples
julia> f(x) = exp(-x) - cos(x)
+A type that allows you to specify a additive proposal function in the log domain and a prior distrubution over the log of the branchlengths. It also holds the acceptance ratio acc_ratio (acc_ratio[1] stores the number of accepts, and acc_ratio[2] stores the number of rejects).
Indicate that we want to do a downward pass, e.g. sample_down!. The function passed to the constructor takes a node::FelNode as input and returns a Bool that decides if node stores its observations.
Initialize an empty LazyPartition that is meant for wrapping a partition of type PType.
Description
With this data structure, you can wrap a partition of choice. The idea is that in some message passing algorithms, there is only a wave of partitions which need to actualize. For instance, a wave following a root-leaf path, or a depth-first traversal. In which case, we can be more economical with our memory consumption. With a worst case memory complexity of O(log(n)), where n is the number of nodes, functionality is provided for:
log_likelihood!
felsenstein!
sample_down!
Note
For successive felsenstein! calls, we need to extract the information at the root somehow after each call. This can be done with e.g. total_LL or site_LLs.
Further requirements
Suppose you want to wrap a partition of PType with LazyPartition:
If you're calling log_likelihood! and felsenstein!:
obs2partition!(partition::PType, obs) that transforms an observation to a partition.
If you're calling sample_down!:
partition2obs(partition::PType) that returns the most likely state from a partition, inverts obs2partition!.
Propagate the source partition backwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.
Performs a BFS map-reduce over the tree, starting at a given node For each node, mapreduce is called as: mapreduce(currnode::FelNode, prevnode::FelNode, aggregator) where prev_node is the previous node visited on the path from the start node to the current node It is expected to update the aggregator, and not return anything.
Not exactly conventional map-reduce, as map-reduce calls may rely on state in the aggregator added by map-reduce calls on other nodes visited earlier.
Uses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.
Keyword Arguments
partition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models, the default option).
tol=1e-5: absolute tolerance for the bl_modifier.
bl_modifier=GoldenSectionOpt(): can either be a optimizer or a sampler (subtype of UnivariateModifier). For optimization, in addition to golden section search, Brent's method can be used by setting bl_modifier=BrentsMethodOpt().
sort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.
traversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.
shuffle=false: do a randomly shuffled traversal, overrides traversal.
Given a function f with a single local minimum in the interval (a,b), Brent's method returns an approximation of the x-value that minimizes f to an accuaracy between 2tol and 3tol, where tol is a combination of a relative and an absolute tolerance, tol := ε|x| + t. ε should be no smaller 2*eps, and preferably not much less than sqrt(eps), which is also the default value. eps is defined here as the machine epsilon in double precision. t should be positive.
The method combines the stability of a Golden Section Search and the superlinear convergence Successive Parabolic Interpolation has under certain conditions. The method never converges much slower than a Fibonacci search and for a sufficiently well-behaved f, convergence can be exptected to be superlinear, with an order that's usually atleast 1.3247...
Examples
julia> f(x) = exp(-x) - cos(x)
f (generic function with 1 method)
julia> m = brents_method_minimize(f, -1, 2, identity, 1e-7)
-0.5885327257940255
From: Richard P. Brent, "Algorithms for Minimization without Derivatives" (1973). Chapter 5.
Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.
Takes a vector of sequences and returns a vector of the proportion of each character across all sequences. An example alphabet argument is MolecularEvolution.AAstring.
Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.
Takes a vector of sequences and returns a vector of the proportion of each character across all sequences. An example alphabet argument is MolecularEvolution.AAstring.
collect_leaf_dists(trees::Vector{<:AbstractTreeNode})
-Returns a list of distance matrices containing the distance between the leaf nodes, which can be used to assess mixing.
colored_seq_draw(x, y, str::AbstractString; color_dict=Dict(), font_size=8pt, posx=hcenter, posy=vcenter)
Draw an arbitrary sequence. color_dict gives a mapping from characters to colors (default black). Default options for nucleotide colorings and amino acid colorings are given in the constants NUC_COLORS and AA_COLORS. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.
function copy_tree(root::FelNode, shallow_copy=false)
+Returns a list of distance matrices containing the distance between the leaf nodes, which can be used to assess mixing.
colored_seq_draw(x, y, str::AbstractString; color_dict=Dict(), font_size=8pt, posx=hcenter, posy=vcenter)
Draw an arbitrary sequence. color_dict gives a mapping from characters to colors (default black). Default options for nucleotide colorings and amino acid colorings are given in the constants NUC_COLORS and AA_COLORS. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.
function copy_tree(root::FelNode, shallow_copy=false)
-Returns an untangled copy of the tree. Optionally, the flag `shallow_copy` can be used to obtain a copy of the tree with only the names and branchlengths.
Checks whether two trees are equal by recursively calling this on all fields, except :parent, in order to prevent cycles. In order to ensure that the :parent field is not hiding something different on both trees, ensure that each is consistent first (see: istreeconsistent).
Takes a tree and a tag_func, which converts the leaf label into a category (ie. there should be <20 of these), and returns a color dictionary that can be used to color the leaves or bubbles.
Example tagfunc: function tagfunc(nam::String) return split(nam,"_")[1] end
For prettier colors, but less discrimination: rainbow = true To randomize the rainbow color assignment: scramble = true col_seed is currently set to white, and excluded from the list of colors, to make them more visible.
Consider making your own version of this function to customize colors as you see fit.
Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.
Should usually be called on the root of the tree. Propagates Felsenstein pass up from the tips to the root. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
Should usually be called on the root of the tree. Propagates Felsenstein pass down from the root to the tips. felsenstein!() should usually be called first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
Propagate the source partition forwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.
Takes a symmetric rate matrix and gap rate (governing mutations to and from gaps) and returns a gappy rate matrix. The equilibrium frequencies are multiplied on column-wise.
Converts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.
Given a function f with a single local minimum in the interval [a,b], gss returns a subset interval [c,d] that contains the minimum with d-c <= tol.
Examples
julia> f(x) = -(x-2)^2
+Returns an untangled copy of the tree. Optionally, the flag `shallow_copy` can be used to obtain a copy of the tree with only the names and branchlengths.
Checks whether two trees are equal by recursively calling this on all fields, except :parent, in order to prevent cycles. In order to ensure that the :parent field is not hiding something different on both trees, ensure that each is consistent first (see: istreeconsistent).
Takes a tree and a tag_func, which converts the leaf label into a category (ie. there should be <20 of these), and returns a color dictionary that can be used to color the leaves or bubbles.
Example tagfunc: function tagfunc(nam::String) return split(nam,"_")[1] end
For prettier colors, but less discrimination: rainbow = true To randomize the rainbow color assignment: scramble = true col_seed is currently set to white, and excluded from the list of colors, to make them more visible.
Consider making your own version of this function to customize colors as you see fit.
Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.
Should usually be called on the root of the tree. Propagates Felsenstein pass up from the tips to the root. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
Should usually be called on the root of the tree. Propagates Felsenstein pass down from the root to the tips. felsenstein!() should usually be called first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
Propagate the source partition forwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.
Takes a symmetric rate matrix and gap rate (governing mutations to and from gaps) and returns a gappy rate matrix. The equilibrium frequencies are multiplied on column-wise.
Converts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.
highlight_seq_draw(x, y, str::AbstractString, region, basecolor, hicolor; fontsize=8pt, posx=hcenter, posy=vcenter)
Draw a sequence, highlighting the sites given in region. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.
highlight_seq_draw(x, y, str::AbstractString, region, basecolor, hicolor; fontsize=8pt, posx=hcenter, posy=vcenter)
Draw a sequence, highlighting the sites given in region. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.
internal_message_init!(tree::FelNode, empty_message::Vector{<:Partition})
+Initializes the message template for each node in the tree, as an array of the partition.
internal_message_init!(tree::FelNode, empty_message::Vector{<:Partition})
-Initializes the message template for each node in the tree, allocating space for each partition.
First re-computes the upward felsenstein pass, and then computes the log likelihood of this tree. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
Computed the log likelihood of this tree. Requires felsenstein!() to have been run. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
Returns the longest path in a tree For convenience, this is returned as two lists of form: [leafnode, parentnode, .... root] Where the leaf_node nodes are selected to be the furthest away
Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.
Takes a numerical matrix and a vector of labels, and returns a typically mixed type matrix with the numerical values and the labels. This is to easily visualize rate matrices in eg. the REPL.
First re-computes the upward felsenstein pass, and then computes the log likelihood of this tree. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
Computed the log likelihood of this tree. Requires felsenstein!() to have been run. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
Returns the longest path in a tree For convenience, this is returned as two lists of form: [leafnode, parentnode, .... root] Where the leaf_node nodes are selected to be the furthest away
Takes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.
Takes a numerical matrix and a vector of labels, and returns a typically mixed type matrix with the numerical values and the labels. This is to easily visualize rate matrices in eg. the REPL.
Samples tree topologies from a posterior distribution.
Arguments
initial_tree: An initial tree topology with the leaves populated with data, for the likelihood calculation.
models: A list of branch models.
num_of_samples: The number of tree samples drawn from the posterior.
bl_sampler: Sampler used to drawn branchlengths from the posterior.
burn_in: The number of samples discarded at the start of the Markov Chain.
sample_interval: The distance between samples in the underlying Markov Chain (to reduce sample correlation).
collect_LLs: Specifies if the function should return the log-likelihoods of the trees.
midpoint_rooting: Specifies whether the drawn samples should be midpoint rerooted (Important! Should only be used for time-reversible branch models starting in equilibrium).
Note
The leaves of the initial tree should be populated with data and felsenstein! should be called on the initial tree before calling this function.
Returns
samples: The trees drawn from the posterior. Returns shallow tree copies, which needs to be repopulated before running felsenstein! etc.
sample_LLs: The associated log-likelihoods of the tree (optional).
mix(swm_part::SWMPartition{PType} ) where {PType <: MultiSitePartition}
mix collapses a Site-Wise Mixture partition to a single component partition, weighted by the site-wise likelihoods for each component, and the init weights. Specifically, it takes a SWMPartition{Ptype} and returns a PType. You'll need to have this implemented for certain helper functionality if you're playing with new kinds of SWMPartitions that aren't mixtures of DiscretePartitions.
Considers local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.
Keyword Arguments
partition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models, the default option).
selection_rule = x -> argmax(x): a function that takes the current and proposed log likelihoods and selects a nni configuration. Note that the current log likelihood is stored at x[1].
sort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.
traversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.
shuffle=false: do a randomly shuffled traversal, overrides traversal.
Extracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.
Takes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if
tolerate_missing = 0, an error will be thrown
tolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)
tolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)
A renaming function that can eg. strip tags from the tree when matching leaf names with names can be passed to leaf_name_transform
Creates a new tree similar to the given tree, but with 'dummy' leaf nodes (w/ zero branchlength) representing each internal node (for drawing / evenly spacing labels internal nodes).
Takes a NEGATIVE log likelihood function (compatible with Optim.jl), a vector of maximizing parameters, an a parameter index. Returns the quadratic confidence interval.
Takes xvec, a vector of parameter values, and yvec, a vector of log likelihood evaluations (note: NOT the negative LLs you) might use with Optim.jl. Returns the confidence intervals computed by a quadratic approximation to the LL.
Takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.
Returns a vector of root-to-tip distances, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.
Generates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
Note: Might only work if you're using the GR backend!! Saves a figure created using the PhyloPlots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.
sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)
Simulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.
Samples tree topologies from a posterior distribution.
Arguments
initial_tree: An initial tree topology with the leaves populated with data, for the likelihood calculation.
models: A list of branch models.
num_of_samples: The number of tree samples drawn from the posterior.
bl_sampler: Sampler used to drawn branchlengths from the posterior.
burn_in: The number of samples discarded at the start of the Markov Chain.
sample_interval: The distance between samples in the underlying Markov Chain (to reduce sample correlation).
collect_LLs: Specifies if the function should return the log-likelihoods of the trees.
midpoint_rooting: Specifies whether the drawn samples should be midpoint rerooted (Important! Should only be used for time-reversible branch models starting in equilibrium).
Note
The leaves of the initial tree should be populated with data and felsenstein! should be called on the initial tree before calling this function.
Returns
samples: The trees drawn from the posterior. Returns shallow tree copies, which needs to be repopulated before running felsenstein! etc.
sample_LLs: The associated log-likelihoods of the tree (optional).
mix(swm_part::SWMPartition{PType} ) where {PType <: MultiSitePartition}
mix collapses a Site-Wise Mixture partition to a single component partition, weighted by the site-wise likelihoods for each component, and the init weights. Specifically, it takes a SWMPartition{Ptype} and returns a PType. You'll need to have this implemented for certain helper functionality if you're playing with new kinds of SWMPartitions that aren't mixtures of DiscretePartitions.
Considers local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.
Keyword Arguments
partition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models, the default option).
selection_rule = x -> argmax(x): a function that takes the current and proposed log likelihoods and selects a nni configuration. Note that the current log likelihood is stored at x[1].
sort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.
traversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.
shuffle=false: do a randomly shuffled traversal, overrides traversal.
Extracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.
Plots multiple phylogenetic trees against a reference tree, inf_tree. For each tree in trees, a linear Weighted Least Squares (WLS) problem (parameterized by the weight_fn keyword) is solved for the x-positions of the matching nodes between inf_tree and tree.
Keyword Arguments
node_size=4: the size of the nodes in the plot.
line_width=0.5: the width of the branches from trees.
font_size=10: the font size for the leaf labels.
margin=1.5: the margin between a leaf node and its label.
line_alpha=0.05: the transparency level of the branches from trees.
y_jitter=0.0: the standard deviation of the noise in the y-coordinate.
weight_fn=n::FelNode -> ifelse(isroot(n), 1.0, 0.0)): a function that assigns a weight to a node for the WLS problem.
opt_scale=true: whether to include a scaling parameter for the WLS problem.
Takes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if
tolerate_missing = 0, an error will be thrown
tolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)
tolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)
A renaming function that can eg. strip tags from the tree when matching leaf names with names can be passed to leaf_name_transform
Creates a new tree similar to the given tree, but with 'dummy' leaf nodes (w/ zero branchlength) representing each internal node (for drawing / evenly spacing labels internal nodes).
Takes a NEGATIVE log likelihood function (compatible with Optim.jl), a vector of maximizing parameters, an a parameter index. Returns the quadratic confidence interval.
Takes xvec, a vector of parameter values, and yvec, a vector of log likelihood evaluations (note: NOT the negative LLs you) might use with Optim.jl. Returns the confidence intervals computed by a quadratic approximation to the LL.
Takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.
Returns a vector of root-to-tip distances, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.
Generates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
Note: Might only work if you're using the GR backend!! Saves a figure created using the PhyloPlots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.
sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)
Simulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.
If called on the root, it returns the log likelihood associated with that partition. Can be overloaded for complex partitions without straightforward site log likelihoods.
Returns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.
Returns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.
If called on the root, it returns the log likelihood associated with that partition. Can be overloaded for complex partitions without straightforward site log likelihoods.
Returns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.
Returns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.
Takes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.
Takes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.
Takes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.
Takes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.
Maximizes f(x) using a Golden Section Search. See ?golden_section_maximize.
Examples
julia> f(x) = -(x-2)^2
f (generic function with 1 method)
julia> m = univariate_maximize(f, 1, 5, identity, GoldenSectionOpt(), 1e-10)
-2.0000000000051843
univariate_sampler(LL, modifier::BranchlengthPeturbation, curr_branchlength)
-A MCMC algorithm that draws the next sample of a Markov Chain that approximates the Posterior distrubution over the branchlengths.
values_from_phylo_tree(phylo_tree, key)
+A MCMC algorithm that draws the next sample of a Markov Chain that approximates the Posterior distrubution over the branchlengths.
values_from_phylo_tree(phylo_tree, key)
-Returns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.
Takes a conditional likelihood matrix (#categories-by-sites) and a starting frequency vector θ (length(θ) = #categories) and optimizes θ (using Expectation Maximization. Maybe.). If conc > 0 then this gives something like variational bayes behavior for LDA. Maybe.
Writes the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.
This document was generated with Documenter.jl version 1.6.0 on Sunday 1 September 2024. Using Julia version 1.10.5.
+Returns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.
Takes a conditional likelihood matrix (#categories-by-sites) and a starting frequency vector θ (length(θ) = #categories) and optimizes θ (using Expectation Maximization. Maybe.). If conc > 0 then this gives something like variational bayes behavior for LDA. Maybe.
Writes the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.
Initialize an empty LazyPartition that is meant for wrapping a partition of type PType.
Description
With this data structure, you can wrap a partition of choice. The idea is that in some message passing algorithms, there is only a wave of partitions which need to actualize. For instance, a wave following a root-leaf path, or a depth-first traversal. In which case, we can be more economical with our memory consumption. With a worst case memory complexity of O(log(n)), where n is the number of nodes, functionality is provided for:
log_likelihood!
felsenstein!
sample_down!
Note
For successive felsenstein! calls, we need to extract the information at the root somehow after each call. This can be done with e.g. total_LL or site_LLs.
Further requirements
Suppose you want to wrap a partition of PType with LazyPartition:
If you're calling log_likelihood! and felsenstein!:
obs2partition!(partition::PType, obs) that transforms an observation to a partition.
If you're calling sample_down!:
partition2obs(partition::PType) that returns the most likely state from a partition, inverts obs2partition!.
Initialize an empty LazyPartition that is meant for wrapping a partition of type PType.
Description
With this data structure, you can wrap a partition of choice. The idea is that in some message passing algorithms, there is only a wave of partitions which need to actualize. For instance, a wave following a root-leaf path, or a depth-first traversal. In which case, we can be more economical with our memory consumption. With a worst case memory complexity of O(log(n)), where n is the number of nodes, functionality is provided for:
log_likelihood!
felsenstein!
sample_down!
Note
For successive felsenstein! calls, we need to extract the information at the root somehow after each call. This can be done with e.g. total_LL or site_LLs.
Further requirements
Suppose you want to wrap a partition of PType with LazyPartition:
If you're calling log_likelihood! and felsenstein!:
obs2partition!(partition::PType, obs) that transforms an observation to a partition.
If you're calling sample_down!:
partition2obs(partition::PType) that returns the most likely state from a partition, inverts obs2partition!.
By this slight modification, we go from initializing and using 554 partitions to 6 during the subsequent log_likelihood! and felsenstein! calls. There is no significant decrease in performance recorded from this switch.
Now, we provided a direction for lazyprep!. The direction is an instance of LazyDown, which was initialized with the isleafnode function. The function isleafnode dictates if a node saves its sampled observation after a down pass. If you use direction=LazyDown(), every node saves its observation.
Indicate that we want to do a downward pass, e.g. sample_down!. The function passed to the constructor takes a node::FelNode as input and returns a Bool that decides if node stores its observations.
Now, we provided a direction for lazyprep!. The direction is an instance of LazyDown, which was initialized with the isleafnode function. The function isleafnode dictates if a node saves its sampled observation after a down pass. If you use direction=LazyDown(), every node saves its observation.
Indicate that we want to do a downward pass, e.g. sample_down!. The function passed to the constructor takes a node::FelNode as input and returns a Bool that decides if node stores its observations.
There are two distinct kinds of optimization: "global" model parameters, and then tree branchlengths and topology. These are kept distinct because we can use algorithmic tricks to dramatically improve the performance of the latter.
The example below will set up and optimize a "Generalized Time Reversible" nucleotide substitution model, where there are 6 rate parameters that govern the symmetric part of a rate matrix, and 4 nucleotide frequencies (that sum to 1, so only 3 underlying parameters).
We first need to construct an objective function. A very common use case involves parameterizing a rate matrix (along with all the constraints this entails) from a flat parameter vector. reversibleQ can be convenient here, which takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle (excluding the diagonal) of the rate matrix:
There are two distinct kinds of optimization: "global" model parameters, and then tree branchlengths and topology. These are kept distinct because we can use algorithmic tricks to dramatically improve the performance of the latter.
The example below will set up and optimize a "Generalized Time Reversible" nucleotide substitution model, where there are 6 rate parameters that govern the symmetric part of a rate matrix, and 4 nucleotide frequencies (that sum to 1, so only 3 underlying parameters).
We first need to construct an objective function. A very common use case involves parameterizing a rate matrix (along with all the constraints this entails) from a flat parameter vector. reversibleQ can be convenient here, which takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle (excluding the diagonal) of the rate matrix:
Takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.
Takes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.
Uses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.
Keyword Arguments
partition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models, the default option).
tol=1e-5: absolute tolerance for the bl_modifier.
bl_modifier=GoldenSectionOpt(): can either be a optimizer or a sampler (subtype of UnivariateModifier). For optimization, in addition to golden section search, Brent's method can be used by setting bl_modifier=BrentsMethodOpt().
sort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.
traversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.
shuffle=false: do a randomly shuffled traversal, overrides traversal.
Considers local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.
Keyword Arguments
partition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models, the default option).
selection_rule = x -> argmax(x): a function that takes the current and proposed log likelihoods and selects a nni configuration. Note that the current log likelihood is stored at x[1].
sort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.
traversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.
shuffle=false: do a randomly shuffled traversal, overrides traversal.
Takes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.
Takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.
Takes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.
Uses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.
Keyword Arguments
partition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models, the default option).
tol=1e-5: absolute tolerance for the bl_modifier.
bl_modifier=GoldenSectionOpt(): can either be a optimizer or a sampler (subtype of UnivariateModifier). For optimization, in addition to golden section search, Brent's method can be used by setting bl_modifier=BrentsMethodOpt().
sort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.
traversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.
shuffle=false: do a randomly shuffled traversal, overrides traversal.
Considers local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.
Keyword Arguments
partition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models, the default option).
selection_rule = x -> argmax(x): a function that takes the current and proposed log likelihoods and selects a nni configuration. Note that the current log likelihood is stored at x[1].
sort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.
traversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.
shuffle=false: do a randomly shuffled traversal, overrides traversal.
Takes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.
This document was generated with Documenter.jl version 1.8.0 on Friday 22 November 2024. Using Julia version 1.11.1.
diff --git a/dev/search_index.js b/dev/search_index.js
index a350c70..46c83b6 100644
--- a/dev/search_index.js
+++ b/dev/search_index.js
@@ -1,3 +1,3 @@
var documenterSearchIndex = {"docs":
-[{"location":"optimization/#Optimization","page":"Optimization","title":"Optimization","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"There are two distinct kinds of optimization: \"global\" model parameters, and then tree branchlengths and topology. These are kept distinct because we can use algorithmic tricks to dramatically improve the performance of the latter.","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"The example below will set up and optimize a \"Generalized Time Reversible\" nucleotide substitution model, where there are 6 rate parameters that govern the symmetric part of a rate matrix, and 4 nucleotide frequencies (that sum to 1, so only 3 underlying parameters).","category":"page"},{"location":"optimization/#Optimizing-model-parameters","page":"Optimization","title":"Optimizing model parameters","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"We first need to construct an objective function. A very common use case involves parameterizing a rate matrix (along with all the constraints this entails) from a flat parameter vector. reversibleQ can be convenient here, which takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle (excluding the diagonal) of the rate matrix:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"using MolecularEvolution #hide\nreversibleQ(1:6,ones(4))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"...and the equilibrium frequencies are multiplied column-wise:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"reversibleQ(ones(6),[0.1,0.2,0.3,0.4])","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Another convenient trick is to be able to parameterize a vector of positive frequencies that sum to 1, using N-1 unconstrained parameters. unc2probvec can help:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"unc2probvec(zeros(3))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"ParameterHandling.jl provides a convenient framework for managing collections of parameters in a way that plays with much of the Julia optimization ecosystem, and we recommend its use. Here we'll use ParameterHandling and NLopt.","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"First, we'll load in some example nucleotide data:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"using MolecularEvolution, FASTX, ParameterHandling, NLopt\n\n#Read in seqs and tree, and populate the three NucleotidePartitions\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\ninitial_partition = NucleotidePartition(length(seqs[1]))\npopulate_tree!(tree,initial_partition,seqnames,seqs)","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Then we set up the model parameters, and the objective function:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"#Named tuple of parameters, with initial values and constraints (from ParameterHandling.jl)\ninitial_params = (\n rates=positive(ones(6)), #rates must be non-negative\n pi=zeros(3) #will be transformed into 4 eq freqs\n)\nflat_initial_params, unflatten = value_flatten(initial_params) #See ParameterHandling.jl docs\nnum_params = length(flat_initial_params)\n\n#Set up a function that builds a model from these parameters\nfunction build_model_vec(params)\n pi = unc2probvec(params.pi)\n return DiagonalizedCTMC(reversibleQ(params.rates,pi))\nend\n\n#Set up the function to be *minimized*\nfunction objective(params::NamedTuple; tree = tree)\n #In this example, we are optimizing the nuc equilibrium freqs\n #We'll also assume that the starting frequencies (at the root of the tree) are the eq freqs\n tree.parent_message[1].state .= unc2probvec(params.pi)\n return -log_likelihood!(tree,build_model_vec(params)) #Note, negative of LL, because minimization\nend","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Then we'll set up an optimizer from NLOpt. See this discussion and this exploration of optimizers.","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"opt = Opt(:LN_BOBYQA, num_params)\n#Note: NLopt requires a function that returns a gradient, even for gradient free methods, hence (x,y)->...\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x)) #See ParameterHandling.jl docs for objective ∘ unflatten explanation\n#Some bounds (which will be in the transformed domain) to prevent searching numerically silly bits of parameter space:\nlower_bounds!(opt, [-10.0 for i in 1:num_params])\nupper_bounds!(opt, [10.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\n_,mini,_ = NLopt.optimize(opt, flat_initial_params)\nfinal_params = unflatten(mini)\n\noptimized_model = build_model_vec(final_params)\nprintln(\"Opt LL:\",log_likelihood!(tree,optimized_model))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Opt LL:-3783.226756522292","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"We can view the optimized parameter values:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"println(\"Rates: \", round.(final_params.rates,sigdigits = 4))\nprintln(\"Pi:\", round.(unc2probvec(final_params.pi),sigdigits = 4))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Rates: [1.124, 2.102, 1.075, 0.9802, 1.605, 0.5536]\nPi:[0.2796, 0.2192, 0.235, 0.2662]","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Or the entire optimized rate matrix:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"matrix_for_display(optimized_model.Q,['A','C','G','T'])","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Opt LL:-3783.226756522292\n5×5 Matrix{Any}:\n \"\" 'A' 'C' 'G' 'T'\n 'A' -1.02672 0.246386 0.494024 0.286309\n 'C' 0.314289 -0.971998 0.23034 0.427368\n 'G' 0.587774 0.214842 -0.950007 0.147391\n 'T' 0.300663 0.35183 0.130093 -0.782586","category":"page"},{"location":"optimization/#Optimizing-the-tree-topology-and-branch-lengths","page":"Optimization","title":"Optimizing the tree topology and branch lengths","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"With a tree and a model, we can also optimize the branch lengths and search, by nearest neighbour interchange for changes to the tree that improve the likelihood. Individually, these are performed by nni_optim! and branchlength_optim!, which need to have felsenstein! and felsenstein_down! called beforehand, but this is all bundled into:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"tree_polish!(tree, optimized_model)","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"LL: -3783.226756522292\nLL: -3782.345818028071\nLL: -3782.3231632207567\nLL: -3782.3211724011044\nLL: -3782.321068684831\nLL: -3782.3210622627776","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"And just to convince you this works, we can perturb the branch lengths, and see how the likelihood improves:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"for n in getnodelist(tree)\n n.branchlength *= (rand()+0.5)\nend\ntree_polish!(tree, optimzed_model)","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"LL: -3805.4140940138795\nLL: -3782.884883999107\nLL: -3782.351780962518\nLL: -3782.322906364547\nLL: -3782.321183009534\nLL: -3782.3210398963506\nLL: -3782.3210271696703","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"warning: Warning\ntree_polish! probably won't find a good tree from a completely start. Different tree search heuristics are required for that.","category":"page"},{"location":"optimization/#Functions","page":"Optimization","title":"Functions","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"reversibleQ\nunc2probvec\nbranchlength_optim!\nnni_optim!\ntree_polish!","category":"page"},{"location":"optimization/#MolecularEvolution.reversibleQ","page":"Optimization","title":"MolecularEvolution.reversibleQ","text":"reversibleQ(param_vec,eq_freqs)\n\nTakes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.unc2probvec","page":"Optimization","title":"MolecularEvolution.unc2probvec","text":"unc2probvec(v)\n\nTakes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.branchlength_optim!","page":"Optimization","title":"MolecularEvolution.branchlength_optim!","text":"branchlength_optim!(tree::FelNode, models; )\n\nUses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.\n\nKeyword Arguments\n\npartition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models, the default option).\ntol=1e-5: absolute tolerance for the bl_modifier.\nbl_modifier=GoldenSectionOpt(): can either be a optimizer or a sampler (subtype of UnivariateModifier). For optimization, in addition to golden section search, Brent's method can be used by setting bl_modifier=BrentsMethodOpt().\nsort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.\ntraversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.\nshuffle=false: do a randomly shuffled traversal, overrides traversal.\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.nni_optim!","page":"Optimization","title":"MolecularEvolution.nni_optim!","text":"nni_optim!(tree::FelNode, models; )\n\nConsiders local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.\n\nKeyword Arguments\n\npartition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models, the default option).\nselection_rule = x -> argmax(x): a function that takes the current and proposed log likelihoods and selects a nni configuration. Note that the current log likelihood is stored at x[1].\nsort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.\ntraversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.\nshuffle=false: do a randomly shuffled traversal, overrides traversal.\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.tree_polish!","page":"Optimization","title":"MolecularEvolution.tree_polish!","text":"tree_polish!(newt, models; tol = 10^-4, verbose = 1, topology = true)\n\nTakes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.\n\n\n\n\n\n","category":"function"},{"location":"simulation/#Simulation","page":"Simulation","title":"Simulation","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"The two key steps in phylogenetic simulation are 1) simulating the phylogeny itself, and 2) simulating data that evolves over the phylogeny.","category":"page"},{"location":"simulation/#Simulating-phylogenies","page":"Simulation","title":"Simulating phylogenies","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"warning: Warning\nWhile our sim_tree function seems to produce trees with the right shape, and is good enough for eg. generating varied tree shapes to evaluate different phylogeny inference schemes under, it is not yet sufficiently checked and tested for use where the details of the coalescent need to be absolutely accurate. It could, for example, be off by a constant factor somewhere. So if you plan on using this in a such a manner for a publication, please check the sim_tree code (and let us know).","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"If you just need a simple tree for testing things, then you can just use:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"tree = sim_tree(n=100)\ntree_draw(tree, draw_labels = false, canvas_height = 5cm)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"This has the characteristic \"coalescent under constant population size\" look.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"However, sim_tree is a bit more powerful than this: it aims to simulate branching under a coalescent process with flexible options for how the effective population size, as well as the sampling rate, might change over time. This is important, because the \"constant population size\" model is quite extreme, and most of the divergence happens in the early internal branches.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"A coalescent process runs backwards in time, starting from the most recent tip, and sampling backwards toward the root, coalescing nodes as it goes, and sometimes adding additional sampled tips. With sim_tree, if nstart = add_limit, then all the tips will be sampled at the same time, and the tree will be ultrametric.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"sim_tree has two arguments driving its flexibility. We'll start with sampling_rate, which controls the rate at which samples are added to the tree. Even under constant effective population size, this can produce interesting behavior.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"for sampling_rate in [5.0, 0.5, 0.05, 0.005]\n tree = sim_tree(100,1000.0,sampling_rate)\n display(tree_draw(tree, draw_labels = false, canvas_height = 5cm))\nend","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Above, this rate was just a fixed constant value, but we can also let this be a function. In this example, we'll plot the tree alongside the sampling rate function, as well as the cumulative number of samples through time.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"s(t) = ifelse(0 sum(x .> sample_times), xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"cumulative samples\", legend = :none))","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Note how the x axis of these plots is flipped, since the leaf furtherest from the root begins at time=0, and the coalescent runs backwards, from tip to root.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We can also vary the effective population size over time, which adds a different dimension of control. Here is an example showing the shape of a tree under exponential growth:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"n(t) = 100000*exp(-t/10)\ntree = sim_tree(100,n,100.0, nstart = 100)\ndisplay(tree_draw(tree, draw_labels = false, canvas_height = 7cm, canvas_width = 14cm))\n\nroot_dists,_ = MolecularEvolution.root2tip_distances(tree)\nplot(0.0:0.1:maximum(root_dists),n, xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"effective population size\", legend = :none)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Logistic growth, with a relatively low sampling rate, provides a reasonable model of an emerging virus that was only sampled later in its growth trajectory, such as HIV.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"n(t) = 10000/(1+exp(t-10))\ntree = sim_tree(100,n,20.0)\ndisplay(tree_draw(tree, draw_labels = false, canvas_height = 7cm, canvas_width = 14cm))\n\nroot_dists,_ = MolecularEvolution.root2tip_distances(tree)\ndisplay(plot(0.0:0.1:maximum(root_dists),n, xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"effective population size\", legend = :none))\n\nmrd = maximum(root_dists)\nsample_times = mrd .- root_dists\nplot(0.0:0.1:mrd,x -> sum(x .> sample_times), xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"cumulative samples\", legend = :none)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"How about a virus with a seasonally varying effective population size, where sampling is proportional to case counts? Between seasons, the effective population size gets so low that the next seasons clade arises from a one or two lineages in the previous season.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"n(t) = exp(sin(t/10) * 2.0 + 4)\ns(t) = n(t)/100\ntree = sim_tree(500,n,s)\ndisplay(tree_draw(tree, draw_labels = false))\n\n\nroot_dists,_ = MolecularEvolution.root2tip_distances(tree)\ndisplay(plot(0.0:0.1:maximum(root_dists),n, xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"effective population size\", legend = :none))\n\nmrd = maximum(root_dists)\nsample_times = mrd .- root_dists\nplot(0.0:0.1:mrd,x -> sum(x .> sample_times), xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"cumulative samples\", legend = :none)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Finally, the mutation_rate argument multiplicatively scales the branch lengths.","category":"page"},{"location":"simulation/#Simulating-evolution-over-phylogenies","page":"Simulation","title":"Simulating evolution over phylogenies","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We'll begin by simulating a tree, like the last example:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"using MolecularEvolution, FASTX, Phylo, Plots, CSV, DataFrames\n\nn(t) = exp(sin(t/10) * 2.0 + 4)\ns(t) = n(t)/100\ntree = sim_tree(500,n,s, mutation_rate = 0.005)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"If we need to open this tree in an external program, we can extract the Newick string representing this tree, and write it to a file:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"newick_string = newick(tree)\nopen(\"flu_sim.tre\",\"w\") do io\n println(io,newick_string)\nend","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Then we can set up a model. In this case, it'll be a combination of a nucleotide model of sequence evolution and Brownian motion over a continuous character.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"nuc_freqs = [0.2,0.3,0.3,0.2]\nnuc_rates = [1.0,2.0,1.0,1.0,1.6,0.5]\nnuc_model = DiagonalizedCTMC(reversibleQ(nuc_rates,nuc_freqs))\nbm_model = BrownianMotion(0.0,1.0)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"As usual, we set up the Partition structure, and load this onto our tree:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"message_template = [NucleotidePartition(nuc_freqs,300),GaussianPartition()]\ninternal_message_init!(tree, message_template)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Then we sample data under our model:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"sample_down!(tree, [nuc_model,bm_model])","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We'll can visualize the Brownian component of the simulation by loading it into the node_dict, and converting to a Phylo.jl tree.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"for n in getnodelist(tree)\n n.node_data = Dict([\"mu\"=>n.message[2].mean])\nend\nphylo_tree = get_phylo_tree(tree)\nplot(phylo_tree, showtips = false, line_z = \"mu\", colorbar = :none,\n linecolor = :darkrainbow, linewidth = 1.0, size = (600, 600))","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We can write the simulated data, including sequences and continuous characters, to a CSV:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"df = DataFrame()\ndf.names = [n.name for n in getleaflist(tree)]\ndf.seqs = [partition2obs(n.message[1]) for n in getleaflist(tree)]\ndf.mu = [partition2obs(n.message[2]) for n in getleaflist(tree)]\nCSV.write(\"flu_sim_seq_and_bm.csv\",df)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Or we could export just the sequences as .fasta","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"write_fasta(\"flu_sim_seq_and_bm.fasta\",df.seqs,seq_names = df.names)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Which will look something like this, when opened in AliView","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/#Functions","page":"Simulation","title":"Functions","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"sim_tree\nsample_down!\npartition2obs","category":"page"},{"location":"simulation/#MolecularEvolution.sim_tree","page":"Simulation","title":"MolecularEvolution.sim_tree","text":"sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)\n\nSimulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.\n\nNefunc(t) = (sin(t/10)+1)*100.0 + 10.0 root = simtree(600,Nefunc,1.0) simpletree_draw(ladderize(root))\n\n\n\n\n\nsim_tree(;n = 10)\n\nSimulates tree with constant population size.\n\n\n\n\n\n","category":"function"},{"location":"simulation/#MolecularEvolution.sample_down!","page":"Simulation","title":"MolecularEvolution.sample_down!","text":"sampledown!(root::FelNode,models,partitionlist)\n\nGenerates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"function"},{"location":"simulation/#MolecularEvolution.partition2obs","page":"Simulation","title":"MolecularEvolution.partition2obs","text":"partition2obs(part::Partition)\n\nExtracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.\n\n\n\n\n\n","category":"function"},{"location":"framework/#The-MolecularEvolution.jl-Framework","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"The organizing principle is that the core algorithms, including Felsenstein's algorithm, but also a related family of message passing algorithms and inference machinery, are implemented in a way that does not refer to any specific model or even to any particular data type.","category":"page"},{"location":"framework/#Partitions-and-BranchModels","page":"The MolecularEvolution.jl Framework","title":"Partitions and BranchModels","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"A Partition is a probabilistic representation of some kind of state. Specifically, it needs to be able to represent P(obs|state) and P(obs,state) when considered as functions of state. So it will typically be able to assign a probability to any possible value of state, and is unnormalized - not required to sum or integrate to 1 over all values of state. As an example, for a discrete state with 4 categories, this could just be a vector of 4 numbers.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"For a Partition type to be usable by MolecularEvolution.jl, the combine! function needs to be implemented. If you have P(obsA|state) and P(obsB|state), then combine! calculates P(obsA,obsB|state) under the assumption that obsA and obsB are conditionally independent given state. MolecularEvolution.jl tries to avoid allocating memory, so combine!(dest,src) places in dest the combined Partition in dest. For a discrete state with 4 categories, this is simply element-wise multiplication of two state vectors.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"A BranchModel defines how Partition distributions evolve along branches. Two functions need to be implemented: backward! and forward!. We imagine our trees with the root at the top, and forward! moves from root to tip, and backward! moves from tip to root. backward!(dest::P,src::P,m::BranchModel,n::FelNode) takes a src Partition, representing P(obs-below|state-at-bottom-of-branch), and modifies the dest Partition to be P(obs-below|state-at-top-of-branch), where the branch in question is the branch above the FelNode n. forward! goes in the opposite direction, from P(obs-above,state-at-top-of-branch) to P(obs-above,state-at-bottom-of-branch), with the Partitions now, confusingly, representing joint distributions.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"(Image: )","category":"page"},{"location":"framework/#Messages","page":"The MolecularEvolution.jl Framework","title":"Messages","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"Nodes on our trees work with messages, where a message is a vector of Partition structs. This is in case you wish to model multiple different data types on the same tree. Often, all the messages on the tree will just be arrays containing a single Partition, but if you're accessing them you need to remember that they're in an array!","category":"page"},{"location":"framework/#Trees","page":"The MolecularEvolution.jl Framework","title":"Trees","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"Each node in our tree is a FelNode (\"Fel\" for \"Felsenstein\"). They point to their parent nodes, and an array of their children, and they store their main vector of Partitions, but also cached versions of those from their parents and children, to allow certain message passing schemes. They also have a branchlength field, which tells eg. forward! and backward! how much evolution occurs along the branch above (ie. closer to the root) that node. They also allow for an arbitrary dictionary of node_data, in case a model needs any other branch-specific parameters.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"The set of algorithms needs to know which model to use for which partition, so the assumption made is that they'll see an array of models whose order will match the partition array. In general, we might want the models to vary from one branch to another, so the central algorithms take a function that associates a FelNode->Vector{: true)\n\nDescription\n\nIndicate that we want to do a downward pass, e.g. sample_down!. The function passed to the constructor takes a node::FelNode as input and returns a Bool that decides if node stores its observations.\n\n\n\n\n\n","category":"type"},{"location":"ancestors/#Ancestral-Reconstruction","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"","category":"section"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"Given a phylogeny, and observations on some set of leaf nodes, \"ancestral reconstruction\" describes a family of approaches for inferring the state of the ancestors, or the distribution over possible states of ancestors.","category":"page"},{"location":"ancestors/#Examples","page":"Ancestral Reconstruction","title":"Examples","text":"","category":"section"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"using MolecularEvolution\n\n#Simulate a small tree, with Brownian motion over it\ntree = sim_tree(n=10)\ninternal_message_init!(tree, GaussianPartition())\nbm_model = BrownianMotion(0.0,0.1)\nsample_down!(tree, bm_model)\n\nr(x) = round(x,sigdigits = 3)\nprintln(\"Leaf values:\")\nfor n in getleaflist(tree)\n println(n.name,\" : \",r(n.message[1].mean))\nend\n\nd = marginal_state_dict(tree,bm_model)\nprintln(\"Inferred internal means (±95% intervals):\")\nfor n in getnonleaflist(tree)\n m,s = d[n][1].mean,sqrt(d[n][1].var)\n println(r(m), \"±\", r(1.96*s), \" - true value: \",r(n.message[1].mean))\nend","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"Leaf values:\ntax8 : -1.03\ntax1 : -1.15\ntax9 : -1.67\ntax10 : -0.112\ntax6 : -0.0183\ntax2 : -0.0574\ntax3 : 0.207\ntax5 : 0.0021\ntax4 : 0.634\ntax7 : 0.544\nInferred internal means (±95% intervals):\n-0.485±0.815 - true value: -0.587\n-1.17±0.556 - true value: -1.37\n-1.1±0.256 - true value: -1.09\n0.116±0.45 - true value: 0.21\n0.0275±0.35 - true value: -0.035\n0.0216±0.283 - true value: 0.0177\n0.0459±0.13 - true value: 0.0485\n0.0532±0.122 - true value: 0.075\n0.571±0.147 - true value: 0.589","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"We can also find the values of the state for each node under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the (maximized) state of the parent node and the observations of all descendents. This ensures that the combination of ancestral states is, jointly, high likelihood. In the case of Brownian motion, these just happen to be the same as the marginal means, but that isn't necessarily the case for other models:","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"d = cascading_max_state_dict(tree,bm_model)\nprintln(\"Inferred internal values:\")\nfor n in getnonleaflist(tree)\n m = d[n][1].mean\n println(r(m), \" - true value: \",r(n.message[1].mean))\nend","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"Inferred most likely (jointly) internal values:\n-0.485 - true value: -0.587\n-1.17 - true value: -1.37\n-1.1 - true value: -1.09\n0.116 - true value: 0.21\n0.0275 - true value: -0.035\n0.0216 - true value: 0.0177\n0.0459 - true value: 0.0485\n0.0532 - true value: 0.075\n0.571 - true value: 0.589","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"And we can sample internal states under our model, but conditioned on the leaf observations:","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"d = endpoint_conditioned_sample_state_dict(tree,bm_model)\nprintln(\"Sampled states, conditioned on observed leaves:\")\nfor n in getnonleaflist(tree)\n m = d[n][1].mean\n println(r(m), \" - true value: \",r(n.message[1].mean))\nend","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"Sampled states, conditioned on observed leaves:\n-0.784 - true value: -0.587\n-1.3 - true value: -1.37\n-1.13 - true value: -1.09\n-0.155 - true value: 0.21\n0.0118 - true value: -0.035\n0.0305 - true value: 0.0177\n0.0913 - true value: 0.0485\n0.0542 - true value: 0.075\n0.498 - true value: 0.589","category":"page"},{"location":"ancestors/#Functions","page":"Ancestral Reconstruction","title":"Functions","text":"","category":"section"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"marginal_state_dict\ncascading_max_state_dict\nendpoint_conditioned_sample_state_dict","category":"page"},{"location":"ancestors/#MolecularEvolution.marginal_state_dict","page":"Ancestral Reconstruction","title":"MolecularEvolution.marginal_state_dict","text":"marginal_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"function"},{"location":"ancestors/#MolecularEvolution.cascading_max_state_dict","page":"Ancestral Reconstruction","title":"MolecularEvolution.cascading_max_state_dict","text":"cascading_max_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"function"},{"location":"ancestors/#MolecularEvolution.endpoint_conditioned_sample_state_dict","page":"Ancestral Reconstruction","title":"MolecularEvolution.endpoint_conditioned_sample_state_dict","text":"endpoint_conditioned_sample_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"function"},{"location":"examples/#Examples","page":"Examples","title":"Examples","text":"","category":"section"},{"location":"examples/#Example-1:-Amino-acid-ancestral-reconstruction-and-visualization","page":"Examples","title":"Example 1: Amino acid ancestral reconstruction and visualization","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"This example reads amino acid sequences from this FASTA file, and a phylogeny from this Newick tree file. A WAG amino acid model, augmented to explicitly model gap (ie. '-') characters, and a global substitution rate is estimated by maximum likelihood. Under this optimized model, the distribution over ancestral amino acids is constructed for each node, and visualized in multiple ways.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using MolecularEvolution, FASTX, Phylo, Plots\n\n#Read in seqs and tree\nseqnames, seqs = read_fasta(\"Data/MusAA_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusAA_IGHV.tre\")\n\n#Compute AA freqs, which become the equilibrium freqs of the model, and the initial root freqs\nAA_freqs = char_proportions(seqs,MolecularEvolution.gappyAAstring)\n#Build the Q matrix\nQ = gappy_Q_from_symmetric_rate_matrix(WAGmatrix,1.0,AA_freqs)\n#Build the model\nm = DiagonalizedCTMC(Q)\n#Set up the memory on the tree\ninitial_partition = GappyAminoAcidPartition(AA_freqs,length(seqs[1]))\npopulate_tree!(tree,initial_partition,seqnames,seqs)\n\n#Set up a likelihood function to find the scaling constant that best fits the branch lengths of the imported tree\n#Note, calling LL will change the rate, so make sure you set it to what you want after this has been called\nll = function(rate; m = m)\n m.r = rate\n return log_likelihood!(tree,m)\nend\nopt_rate = golden_section_maximize(ll, 0.0, 10.0, identity, 1e-11);\nplot(opt_rate*0.87:0.001:opt_rate*1.15,ll,size = (500,250),\n xlabel = \"rate\",ylabel = \"log likelihood\", legend = :none)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Then set the model parameters to the maximum likelihood estimate, and reconstruct the ancestral states.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"m.r = opt_rate\n#Reconstructing the marginal distributions of amino acids at internal nodes\nd = marginal_state_dict(tree,m)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"That's it! Everything else is for visualizing these ancestral states. We'll select a set of amino acid positions to visualize, corresponding to these two (red arrows) alignment columns:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#The alignment indices we want to pay attention to in our reconstructions\nmotif_inds = [52,53]\n\n#We'll compute a confidence score for the inferred marginal state\nconfidence(state,inds) = minimum([maximum(state[:,i]) for i in inds])\n\n#Map motifs to numbers, so we can work with more convenient continuous color scales\nall_motifs = sort(union([partition2obs(d[n][1])[motif_inds] for n in getnodelist(tree)]))\nmotif2num = Dict(zip(all_motifs,1:length(all_motifs)))\n\n#Populating the node_data dictionary to help with plotting\nfor n in getnodelist(tree)\n moti = partition2obs(d[n][1])[motif_inds]\n n.node_data = Dict([\n \"motif\"=>moti,\n \"motif_color\"=>motif2num[moti],\n \"uncertainty\"=>1-confidence(d[n][1].state,motif_inds)\n ])\nend\n\n#Transducing the MolecularEvolution FelNode tree to a Phylo.jl tree, which migrates node_data as well\nphylo_tree = get_phylo_tree(tree)\nnode_unc = values_from_phylo_tree(phylo_tree,\"uncertainty\")\n\nprintln(\"Greatest motif uncertainty: \",maximum([n.node_data[\"uncertainty\"] for n in getnodelist(tree)]))","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Greatest motif uncertainty: 0.6104376723068156","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Plotting, using discrete marker colors\npl = plot(phylo_tree,\n showtips = true, tipfont = 6, marker_group = \"motif\", palette = :seaborn_bright,\n markeralpha = 0.75, markerstrokewidth = 0, margins = 2Plots.cm, legend = :topleft,\n linewidth = 1.5, size = (400, 800))\n\nsavefig_tweakSVG(\"anc_tree_with_legend.svg\", pl)\npl","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Plotting, using discrete marker colors\npl = plot(phylo_tree, treetype = :fan,\n showtips = true, tipfont = 6, marker_group = \"motif\", palette = :seaborn_bright,\n markeralpha = 0.75, markerstrokewidth = 0, margins = 2Plots.cm, legend = :topleft,\n linewidth = 1.5, size = (800, 800))\n\nsavefig_tweakSVG(\"anc_circ_tree_with_legend.svg\", pl)\npl","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Plotting using continuous color scales, and using marker size to show uncertainty in reconstructions\ncolor_scale = :rainbow\npl = plot(phylo_tree, showtips = true, tipfont = 6, marker_z = \"motif_color\", line_z = \"motif_color\",\n markersize = 10 .* sqrt.(node_unc), linecolor = color_scale, markercolor = color_scale, markeralpha = 0.75,\n markerstrokewidth = 0,margins = 2Plots.cm, colorbar = :none, linewidth = 2.5, size = (400, 800))\n\n#Feeble attempt at a manual legend\nmotif_ys = collect(1:length(all_motifs)) .+ (length(seqs) - length(all_motifs))\nscatter!(zeros(length(all_motifs)) , motif_ys , marker = 8, markeralpha = 0.75,\n marker_z = 1:length(all_motifs), markercolor = color_scale, markerstrokewidth = 0.0)\nfor i in 1:length(all_motifs)\n annotate!(0.1, motif_ys[i], all_motifs[i],7)\nend\n\nsavefig_tweakSVG(\"anc_tree_continuous.svg\", pl)\npl","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/#Example-2:-GTRGamma","page":"Examples","title":"Example 2: GTR+Gamma","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"For site-to-site \"random effects\" rate variation, such as under the GTR+Gamma model, we need to use a \"Site-Wise Mixture\" model, or SWMModel with its SWMPartition.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Set up a function that will return a set of rates that will, when equally weighted, VERY coarsely approx a Gamma distribution\nfunction equiprobable_gamma_grids(s,k)\n grids = quantile(Gamma(s,1/s),1/2k:1/k:(1-1/2k))\n grids ./ mean(grids)\nend\n\n#Read in seqs and tree, and populate the three NucleotidePartitions\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\n\n#Set up the Partition that will be replicated in the SWMModel\ninitial_partition = NucleotidePartition(length(seqs[1]))\n\n#To be able to use unconstrained optimization, we use `ParameterHandling.jl`\ninitial_params = (\n rates=positive(ones(6)),\n gam_shape=positive(1.0),\n pi=zeros(3)\n)\nflat_initial_params, unflatten = value_flatten(initial_params)\nnum_params = length(flat_initial_params)\n\n#Setting up the Site-Wise Mixture Partition:\n#Note: this constructor sets the weights of all categories to 1/rate_cats\n#That is fine for our equi-probable category model, but this will need to be different for other models.\nrate_cats = 5\nREL_partition = MolecularEvolution.SWMPartition{NucleotidePartition}(initial_partition,rate_cats)\npopulate_tree!(tree,REL_partition,seqnames,seqs)\n\nfunction build_model_vec(params; cats = rate_cats)\n r_vals = equiprobable_gamma_grids(params.gam_shape,cats)\n pi = unc2probvec(params.pi)\n return MolecularEvolution.SWMModel(DiagonalizedCTMC(reversibleQ(params.rates,pi)),r_vals)\nend\n\nfunction objective(params::NamedTuple; tree = tree)\n v = unc2probvec(params.pi)\n #Root freqs need to be set over all component partitions\n for p in tree.parent_message[1].parts\n p.state .= v\n end\n return -log_likelihood!(tree,build_model_vec(params))\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\n\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\nscore,mini,did_it_work = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\noptimized_model = build_model_vec(final_params)\nLL = log_likelihood!(tree,optimized_model)\nprintln(did_it_work)\nprintln(\"Opt LL:\",LL)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"SUCCESS\nOpt LL:-3728.4761606135307","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Other functions also work with these kinds of random-effects site-wise mixture models:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"tree_polish!(tree,optimized_model)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"LL: -3728.4761606135307\nLL: -3728.1316616075173\nLL: -3728.121005993758\nLL: -3728.1202243978914\nLL: -3728.1201348447107","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Sometimes we might want the rate values for each category to stay fixed, but optimize their weights:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Using rate categories with fixed values\nfixed_cats = [0.00001,0.33,1.0,3.0,9.0]\n\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\n\ninitial_partition = NucleotidePartition(length(seqs[1]))\n\ninitial_params = (\n rates=positive(ones(6)),\n cat_weights=zeros(length(fixed_cats)-1), #Category weights\n pi=zeros(3) #Nuc freqs\n)\nflat_initial_params, unflatten = value_flatten(initial_params)\nnum_params = length(flat_initial_params)\n\nREL_partition = MolecularEvolution.SWMPartition{NucleotidePartition}(initial_partition,length(fixed_cats))\npopulate_tree!(tree,REL_partition,seqnames,seqs)\n\nfunction build_model_vec(params; cats = fixed_cats)\n cat_weights = unc2probvec(params.cat_weights)\n pi = unc2probvec(params.pi)\n m = MolecularEvolution.SWMModel(DiagonalizedCTMC(reversibleQ(params.rates,pi)),cats)\n m.weights .= cat_weights\n return m\nend\n\nfunction objective(params::NamedTuple; tree = tree)\n v = unc2probvec(params.pi)\n for p in tree.parent_message[1].parts\n p.state .= v\n end\n return -log_likelihood!(tree,build_model_vec(params))\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\n\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\nscore,mini,did_it_work = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\noptimized_model = build_model_vec(final_params)\nLL = log_likelihood!(tree,optimized_model)\n\nprintln(did_it_work)\nprintln(\"Opt LL:\",LL)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"SUCCESS\nOpt LL:-3719.6290948420706","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"When you have a Site-Wise Mixture (ie. REL) model, the category weights can be handled \"outside\" of the main likelihood calculations. This means that they can be optimized very quickly, within an objective function that is optimizing over the other parameters. The following example uses an EM approach to do this:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using Distributions, FASTX, ParameterHandling, NLopt\n\n#Using rate categories with fixed values\nfixed_cats = [(i/5)^2 for i in 1:12]\n\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\n\ninitial_partition = NucleotidePartition(length(seqs[1]))\n\ninitial_params = (\n rates=positive(ones(6)),\n pi=zeros(3) #Nuc freqs\n)\nflat_initial_params, unflatten = value_flatten(initial_params)\nnum_params = length(flat_initial_params)\n\nREL_partition = MolecularEvolution.SWMPartition{NucleotidePartition}(initial_partition,length(fixed_cats))\npopulate_tree!(tree,REL_partition,seqnames,seqs)\n\nfunction build_model_vec(params; cats = fixed_cats)\n pi = unc2probvec(params.pi)\n m = SWMModel(DiagonalizedCTMC(reversibleQ(params.rates,pi)),cats)\n return m\nend\n\n#LL for a mixture when the grid of probabilities is pre-computed\ngrid_ll(v,g) = sum(log.(sum((v./sum(v)) .* g,dims = 1)))\n\n#Note: we can get away with relatively few EM iterations within the optimization cycle (in this example at least)\nfunction opt_weights_and_LL(temp_part::SWMPartition{PType}; iters = 25) where {PType <: MolecularEvolution.MultiSitePartition} \n g,scals = SWM_prob_grid(temp_part) \n l = size(g)[1]\n #We can optimize the category weights without re-computing felsenstein\n #So it can make sense to do so within the optimization function\n #Which means you don't need to optimize over as many parameters\n θ = weightEM(g,ones(l)./l, iters = iters)\n LL_optimizing_over_weights = grid_ll(θ,g) + sum(scals)\n return θ,LL_optimizing_over_weights\nend\n\nfunction objective(params::NamedTuple; tree = tree)\n v = unc2probvec(params.pi)\n for p in tree.parent_message[1].parts\n p.state .= v\n end\n felsenstein!(tree,build_model_vec(params))\n #Optim inside optim\n #We first need to handle the merge of the parent and root partitions - usually handled for us magically!\n #Be careful: this example is hard-coded for a single partition\n temp_part = copy_partition(tree.parent_message[1])\n combine!(temp_part, tree.message[1])\n θ,LL = opt_weights_and_LL(temp_part)\n return -LL\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\n\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\n@time score,mini,did_it_work = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\noptimized_model = build_model_vec(final_params)\n\nfelsenstein!(tree,optimized_model)\ntemp_part = copy_partition(tree.parent_message[1])\ncombine!(temp_part, tree.message[1])\nθ,_ = opt_weights_and_LL(temp_part, iters = 1000) #polish weights for final pass - quick\noptimized_model.weights .= θ\nLL = log_likelihood!(tree,optimized_model)\n\nprintln(did_it_work, \":\", score)\nprintln(\"Opt LL:\",LL)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"3.932150 seconds (2.38 M allocations: 2.378 GiB, 10.78% gc time, 3.28% compilation time: 7% of which was recompilation)\nSUCCESS:3720.1347720900067\nOpt LL:-3719.4808937732614","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"This can be dramatically faster than trying to directly optimize over category weights when the number of categories grows. The above example took 140s with the direct approach.","category":"page"},{"location":"examples/#Example-3:-FUBAR","page":"Examples","title":"Example 3: FUBAR","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"This example reads codon sequences from this FASTA file, and a phylogeny from this Newick tree file, and implements FUBAR.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using MolecularEvolution, FASTX, ParameterHandling, NLopt, Plots\n\n#Read in seqs and tree\nseqnames, seqs = read_fasta(\"Data/Flu.fasta\")\ntree = read_newick_tree(\"Data/Flu.tre\")\n\n#Count F3x4 frequencies from the seqs, and estimate codon freqs from this\nf3x4 = MolecularEvolution.count_F3x4(seqs);\neq_freqs = MolecularEvolution.F3x4_eq_freqs(f3x4);\n\n#Set up a codon partition (will default to Universal genetic code)\ninitial_partition = CodonPartition(Int64(length(seqs[1])/3))\ninitial_partition.state .= eq_freqs\npopulate_tree!(tree,initial_partition,seqnames,seqs)\n\n#We'll use the empirical F3x4 freqs, fixed MG94 alpha=1, and optimize the nuc parameters and MG94 beta\n#Note: the nuc rates are confounded with alpha\ninitial_params = (\n rates=positive(ones(6)), #rates must be non-negative\n beta = positive(1.0)\n)\nflat_initial_params, unflatten = value_flatten(initial_params) #See ParameterHandling.jl docs\nnum_params = length(flat_initial_params)\n\nfunction build_model_vec(p; F3x4 = f3x4, alpha = 1.0)\n #If you run into numerical issues with DiagonalizedCTMC, switch to GeneralCTMC instead\n return DiagonalizedCTMC(MolecularEvolution.MG94_F3x4(alpha, p.beta, reversibleQ(p.rates,ones(4)), F3x4))\nend\n\nfunction objective(params::NamedTuple; tree = tree, eq_freqs = eq_freqs)\n return -log_likelihood!(tree,build_model_vec(params))\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\n@time _,mini,_ = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\nnucmat = reversibleQ(final_params.rates,ones(4))","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":" 10.596546 seconds (840.87 k allocations: 5.221 GiB, 7.45% gc time, 0.35% compilation time: 25% of which was recompilation)\n4×4 Matrix{Float64}:\n -9.41346 1.77048 6.85997 0.783008\n 1.77048 -7.24162 0.280525 5.19061\n 6.85997 0.280525 -8.651 1.5105\n 0.783008 5.19061 1.5105 -7.48412","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"The scaling of that nuc matrix reflects the fact that the we're using a tree that was estimated under a nuc model, but here we're optimizing a codon model. No issue: the nuc rates have absorbed this scaling difference.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Now we set up a 20-by-20 grid, slicing the MG94 α and β parameters at the following values:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"grid_values = 10 .^ (-1.35:0.152:1.6) .- 0.0423174293933042","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"20-element Vector{Float64}:\n 0.0023509298217921012\n 0.021069541732388508\n 0.047632328759699305\n 0.08532645148783018\n 0.13881657986865603\n 0.2147221488835822\n 0.3224365175323036\n 0.4752894025572635\n 0.6921964387638108\n 1.0\n 1.4367909587749033\n 2.05662245423022\n 2.9361990000358853\n 4.184368713262725\n 5.95559333316179\n 8.469062952630463\n 12.0358209216745\n 17.09725564569095\n 24.27972266134484\n 34.47205650419232","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Then we calculate the conditional likelihoods for each site. Note the 20-by-20 grid is stretched out into a length 400 vector to keep things simple. I'm avoiding reshape tricks to keep the grid structure clear.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"LL_matrix = zeros(length(grid_values)^2,initial_partition.sites);\nalpha_vec = zeros(length(grid_values)^2);\nalpha_ind_vec = zeros(Int64,length(grid_values)^2);\nbeta_vec = zeros(length(grid_values)^2);\nbeta_ind_vec = zeros(Int64,length(grid_values)^2);\n\ni = 1\n@time for (a,alpha) in enumerate(grid_values)\n for (b,beta) in enumerate(grid_values)\n alpha_vec[i],beta_vec[i] = alpha, beta\n alpha_ind_vec[i], beta_ind_vec[i] = a,b\n m = DiagonalizedCTMC(MolecularEvolution.MG94_F3x4(alpha, beta, nucmat, f3x4))\n felsenstein!(tree,m)\n #This is because we need to include the eq freqs in the site LLs:\n combine!(tree.message[1],tree.parent_message[1])\n LL_matrix[i,:] .= MolecularEvolution.site_LLs(tree.message[1])\n i += 1\n end\nend\nprob_matrix = exp.(LL_matrix .- maximum(LL_matrix,dims = 1))\nprob_matrix ./= sum(prob_matrix,dims = 1);","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Then we use an EM-like MAP algorithm to find the posterior grid weights, and visualize this surface:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"LDAθ = weightEM(prob_matrix, ones(length(alpha_vec))./length(alpha_vec), conc = 0.4, iters = 5000);\n\n#A function to viz the grid surface\nfunction gridplot(alpha_ind_vec,beta_ind_vec,grid_values,θ; title = \"\")\n scatter(alpha_ind_vec,beta_ind_vec, zcolor = θ, c = :darktest,\n markersize = sqrt(length(alpha_ind_vec))/2, markershape=:square, markerstrokewidth=0.0, size=(550,500),\n label = :none, xticks = (1:length(grid_values), round.(grid_values,digits = 3)), xrotation = 90,\n yticks = (1:length(grid_values), round.(grid_values,digits = 3)), margin=6Plots.mm,\n xlabel = \"α\", ylabel = \"β\", title = title)\n plot!(1:length(grid_values),1:length(grid_values),color = \"grey\", style = :dash, label = :none)\nend\n\ngridplot(alpha_ind_vec,beta_ind_vec,grid_values,LDAθ)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"We can see that the posterior distribution over sites is heavily concentrated at β<α. But are there any sites where β>α?","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"weighted_mat = prob_matrix .* LDAθ\nfor site in 1:size(prob_matrix)[2]\n pos = sum(weighted_mat[beta_vec .> alpha_vec,site])/sum(weighted_mat[:,site])\n if pos > 0.9\n println(\"Site $(site): P(β>α)=$(round(pos,digits = 4))\")\n end\nend","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Site 153: P(β>α)=0.9074\nSite 158: P(β>α)=0.9266\nSite 160: P(β>α)=0.9547","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"And let's visualize one of those sites:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"gridplot(alpha_ind_vec,beta_ind_vec,grid_values, weighted_mat[:,160]./sum(weighted_mat[:,160]))","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"viz/#Visualization","page":"Visualization","title":"Visualization","text":"","category":"section"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"We offer two routes to visualization. The first is using our own plotting routines, built atop Compose.jl. The second converts our trees to Phylo.jl trees, and plots with their Plots.jl recipes. The Compose, Plots, and Phylo dependencies are optional.","category":"page"},{"location":"viz/#Example-1","page":"Visualization","title":"Example 1","text":"","category":"section"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"using MolecularEvolution, Plots, Phylo\n\n#First simulate a tree, and then Brownian motion:\ntree = sim_tree(n=20)\ninternal_message_init!(tree, GaussianPartition())\nbm_model = BrownianMotion(0.0,0.1)\nsample_down!(tree, bm_model)\n\n#We'll add the Gaussian means to the node_data dictionaries\nfor n in getnodelist(tree)\n n.node_data = Dict([\"mu\"=>n.message[1].mean])\nend\n\n#Transducing the mol ev tree to a Phylo.jl tree\nphylo_tree = get_phylo_tree(tree)\n\npl = plot(phylo_tree,\n showtips = true, tipfont = 6, marker_z = \"mu\", markeralpha = 0.5, line_z = \"mu\", linecolor = :darkrainbow, \n markersize = 4.0, markerstrokewidth = 0,margins = 1Plots.cm,\n linewidth = 1.5, markercolor = :darkrainbow, size = (500, 500))","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"(Image: )","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"We also offer savefig_tweakSVG(\"simple_plot_example.svg\", pl) for some post-processing tricks that improve the exported trees, like rounding line caps, and values_from_phylo_tree(phylo_tree,\"mu\") which can extract stored quantities in the right order for passing into eg. markersize options when plotting.","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"For a more comprehensive list of things you can do with Phylo.jl plots, please see their documentation.","category":"page"},{"location":"viz/#Drawing-trees-with-Compose.jl.","page":"Visualization","title":"Drawing trees with Compose.jl.","text":"","category":"section"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"The Compose.jl in-house tree drawing offers extensive flexibility. Here is an example that plots a pie chart representing the marginal probability of each of the 4 possible nucleotides on all nodes on the tree:","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"using MolecularEvolution, Compose\n\ntree = sim_tree(40,1000.0,0.005,mutation_rate = 0.001)\nmodel = DiagonalizedCTMC(reversibleQ(ones(6),ones(4)./4))\ninternal_message_init!(tree, NucleotidePartition(ones(4)./4,1))\nsample_down!(tree,model)\nd = marginal_state_dict(tree,model);\n\ncompose_dict = Dict()\nfor n in getnodelist(tree)\n compose_dict[n] = (x,y)->pie_chart(x,y,d[n][1].state[:,1],size = 0.02, opacity = 0.75)\nend\nimg = tree_draw(tree,draw_labels = false, line_width = 0.5mm, compose_dict = compose_dict)","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"(Image: )","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"This can then be exported with:","category":"page"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"savefig_tweakSVG(\"piechart_tree.svg\",img)","category":"page"},{"location":"viz/#Functions","page":"Visualization","title":"Functions","text":"","category":"section"},{"location":"viz/","page":"Visualization","title":"Visualization","text":"get_phylo_tree\nvalues_from_phylo_tree\nsavefig_tweakSVG\ntree_draw","category":"page"},{"location":"viz/#MolecularEvolution.get_phylo_tree","page":"Visualization","title":"MolecularEvolution.get_phylo_tree","text":"get_phylo_tree(molev_root::FelNode; data_function = (x -> Tuple{String,Float64}[]))\n\nConverts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.\n\n\n\n\n\n","category":"function"},{"location":"viz/#MolecularEvolution.values_from_phylo_tree","page":"Visualization","title":"MolecularEvolution.values_from_phylo_tree","text":"values_from_phylo_tree(phylo_tree, key)\n\nReturns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.\n\n\n\n\n\n","category":"function"},{"location":"viz/#MolecularEvolution.savefig_tweakSVG","page":"Visualization","title":"MolecularEvolution.savefig_tweakSVG","text":"savefig_tweakSVG(fname, plot::Plots.Plot; hack_bounding_box = true, new_viewbox = nothing, linecap_round = true)\n\nNote: Might only work if you're using the GR backend!! Saves a figure created using the Phylo Plots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.\n\neg. savefig_tweakSVG(\"export.svg\",pl, new_viewbox = [-100, -100, 3000, 4500])\n\n\n\n\n\nsavefig_tweakSVG(fname, plot::Context; width = 10cm, height = 10cm, linecap_round = true, white_background = true)\n\nSaves a figure created using the Compose approach, but tweaks the SVG after export.\n\neg. savefig_tweakSVG(\"export.svg\",pl)\n\n\n\n\n\n","category":"function"},{"location":"viz/#MolecularEvolution.tree_draw","page":"Visualization","title":"MolecularEvolution.tree_draw","text":"tree_draw(tree::FelNode;\n canvas_width = 15cm, canvas_height = 15cm,\n stretch_for_labels = 2.0, draw_labels = true,\n line_width = 0.1mm, font_size = 4pt,\n min_dot_size = 0.00, max_dot_size = 0.01,\n line_opacity = 1.0,\n dot_opacity = 1.0,\n name_opacity = 1.0,\n horizontal = true,\n dot_size_dict = Dict(), dot_size_default = 0.0,\n dot_color_dict = Dict(), dot_color_default = \"black\",\n line_color_dict = Dict(), line_color_default = \"black\",\n label_color_dict = Dict(), label_color_default = \"black\",\n nodelabel_dict = Dict(),compose_dict = Dict()\n )\n\nDraws a tree with a number of self-explanatory options. Dictionaries that map a node to a color/size are used to control per-node plotting options. compose_dict must be a FelNode->function(x,y) dictionary that returns a compose() struct.\n\nExample using compose_dict\n\nstr_tree = \"(((((tax24:0.09731668728575642,(tax22:0.08792233964843627,tax18:0.9210388482867483):0.3200367900275155):0.6948314526087965,(tax13:1.9977212308725611,(tax15:0.4290074347886068,(tax17:0.32928401808187824,(tax12:0.3860215462534818,tax16:0.2197134841232339):0.1399122681886174):0.05744611946245004):1.4686085778061146):0.20724159879522402):0.4539334554156126,tax28:0.4885576926440158):0.002162260013924424,tax26:0.9451873777301325):3.8695419798779387,((tax29:0.10062813251515536,tax27:0.27653633028085006):0.04262434258357507,(tax25:0.009345653929737636,((tax23:0.015832941547076644,(tax20:0.5550597590956172,((tax8:0.6649025646927402,tax9:0.358506423199849):0.1439516404012261,tax11:0.01995439013213013):1.155181296134081):0.17930021667907567):0.10906638146207207,((((((tax6:0.013708993438720255,tax5:0.061144001556547097):0.1395453591567641,tax3:0.4713722705245479):0.07432598428904214,tax1:0.5993347898257291):1.0588025698844894,(tax10:0.13109032492533992,(tax4:0.8517302241963356,(tax2:0.8481963081549965,tax7:0.23754095940676642):0.2394313086297733):0.43596704123297675):0.08774657269409454):0.9345533723114966,(tax14:0.7089558245245173,tax19:0.444897137240675):0.08657675809803095):0.01632062723968511,tax21:0.029535281963725537):0.49502691718938285):0.25829576024240986):0.7339777396780424):4.148878039524972):0.0\"\nnewt = gettreefromnewick(str_tree, FelNode)\nladderize!(newt)\ncompose_dict = Dict()\nfor n in getleaflist(newt)\n #Replace the rand(4) with the frequencies you actually want.\n compose_dict[n] = (x,y)->pie_chart(x,y,MolecularEvolution.sum2one(rand(4)),size = 0.03)\nend\ntree_draw(newt,draw_labels = false,line_width = 0.5mm, compose_dict = compose_dict)\n\n\nimg = tree_draw(tree)\nimg |> SVG(\"imgout.svg\",10cm, 10cm)\nOR\nusing Cairo\nimg |> PDF(\"imgout.pdf\",10cm, 10cm)\n\n\n\n\n\n","category":"function"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = MolecularEvolution","category":"page"},{"location":"#MolecularEvolution","page":"Home","title":"MolecularEvolution","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Documentation for MolecularEvolution.","category":"page"},{"location":"#A-Julia-package-for-the-flexible-development-of-phylogenetic-models.","page":"Home","title":"A Julia package for the flexible development of phylogenetic models.","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"MolecularEvolution.jl exploits Julia's multiple dispatch, implementing a fully generic suite of likelihood calculations, branchlength optimization, topology optimization, and ancestral inference. Users can construct trees using already-defined data types and models. But users can define probability distributions over their own data types, and specify the behavior of these under their own model types, and can mix and match different models on the same phylogeny.","category":"page"},{"location":"","page":"Home","title":"Home","text":"If the behavior you need is not already available in MolecularEvolution.jl:","category":"page"},{"location":"","page":"Home","title":"Home","text":"If you have a new data type:\nA Partition type that represents the uncertainty over your state. \ncombine!() that merges evidence from two Partitions.\nIf you have a new model:\nA BranchModel type that stores your model parameters.\nforward!() that evolves state distributions over branches, in the root-to-tip direction.\nbackward!() that reverse-evolves state distributions over branches, in the tip-to-root direction.","category":"page"},{"location":"","page":"Home","title":"Home","text":"And then sampling, likelihood calculations, branch-length optimization, ancestral reconstruction, etc should be available for your new data or model.","category":"page"},{"location":"#Design-principles","page":"Home","title":"Design principles","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"In order of importance, we aim for the following:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Flexibility and generality\nWhere possible, we avoid design decisions that limit the development of new models, or make it harder to develop new models.\nWe do not sacrifice flexibility for performance.\nScalability\nAnalyses implemented using MolecularEvolution.jl should scale to large, real-world datasets.\nPerformance\nWhile the above take precedence over speed, it should be possible to optimize your Partition, combine!(), BranchModel, forward!() and backward!() functions to obtain competative runtimes.","category":"page"},{"location":"#Authors:","page":"Home","title":"Authors:","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Venkatesh Kumar and Ben Murrell, with additional contributions by Sanjay Mohan, Alec Pankow, Hassan Sadiq, and Kenta Sato.","category":"page"},{"location":"#Quick-example:-Likelihood-calculations-under-phylogenetic-Brownian-motion:","page":"Home","title":"Quick example: Likelihood calculations under phylogenetic Brownian motion:","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"using MolecularEvolution, Plots\n\n#First simulate a tree, using a coalescent process\ntree = sim_tree(n=200)\ninternal_message_init!(tree, GaussianPartition())\n#Simulate brownian motion over the tree\nbm_model = BrownianMotion(0.0,1.0)\nsample_down!(tree, bm_model)\n#And plot the log likelihood as a function of the parameter value\nll(x) = log_likelihood!(tree,BrownianMotion(0.0,x))\nplot(0.7:0.001:1.6,ll, xlabel = \"variance per unit time\", ylabel = \"log likelihood\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"(Image: )","category":"page"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"","page":"Home","title":"Home","text":"Modules = [MolecularEvolution]","category":"page"},{"location":"#MolecularEvolution.BranchlengthSampler","page":"Home","title":"MolecularEvolution.BranchlengthSampler","text":"BranchlengthSampler\n\nA type that allows you to specify a additive proposal function in the log domain and a prior distrubution over the log of the branchlengths. It also holds the acceptance ratio acc_ratio (acc_ratio[1] stores the number of accepts, and acc_ratio[2] stores the number of rejects).\n\n\n\n\n\n","category":"type"},{"location":"#MolecularEvolution.LazyDown","page":"Home","title":"MolecularEvolution.LazyDown","text":"Constructors\n\nLazyDown(stores_obs)\nLazyDown() = LazyDown(x::FelNode -> true)\n\nDescription\n\nIndicate that we want to do a downward pass, e.g. sample_down!. The function passed to the constructor takes a node::FelNode as input and returns a Bool that decides if node stores its observations.\n\n\n\n\n\n","category":"type"},{"location":"#MolecularEvolution.LazyPartition","page":"Home","title":"MolecularEvolution.LazyPartition","text":"Constructor\n\nLazyPartition{PType}()\n\nInitialize an empty LazyPartition that is meant for wrapping a partition of type PType.\n\nDescription\n\nWith this data structure, you can wrap a partition of choice. The idea is that in some message passing algorithms, there is only a wave of partitions which need to actualize. For instance, a wave following a root-leaf path, or a depth-first traversal. In which case, we can be more economical with our memory consumption. With a worst case memory complexity of O(log(n)), where n is the number of nodes, functionality is provided for:\n\nlog_likelihood!\nfelsenstein!\nsample_down!\n\nnote: Note\nFor successive felsenstein! calls, we need to extract the information at the root somehow after each call. This can be done with e.g. total_LL or site_LLs.\n\nFurther requirements\n\nSuppose you want to wrap a partition of PType with LazyPartition:\n\nIf you're calling log_likelihood! and felsenstein!:\nobs2partition!(partition::PType, obs) that transforms an observation to a partition.\nIf you're calling sample_down!:\npartition2obs(partition::PType) that returns the most likely state from a partition, inverts obs2partition!.\n\n\n\n\n\n","category":"type"},{"location":"#MolecularEvolution.LazyUp","page":"Home","title":"MolecularEvolution.LazyUp","text":"Constructor\n\nLazyUp()\n\nDescription\n\nIndicate that we want to do an upward pass, e.g. felsenstein!.\n\n\n\n\n\n","category":"type"},{"location":"#Base.:==-Union{Tuple{T}, Tuple{T, T}} where T<:AbstractTreeNode","page":"Home","title":"Base.:==","text":"==(t1, t2)\nDefaults to pointer equality\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.SWM_prob_grid-Union{Tuple{SWMPartition{PType}}, Tuple{PType}} where PType<:MultiSitePartition","page":"Home","title":"MolecularEvolution.SWM_prob_grid","text":"SWM_prob_grid(part::SWMPartition{PType}) where {PType <: MultiSitePartition}\n\nReturns a matrix of probabilities for each site, for each model (in the probability domain - not logged!) as well as the log probability offsets\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution._mapreduce-Union{Tuple{T}, Tuple{AbstractTreeNode, T, Any, Any}} where T<:Function","page":"Home","title":"MolecularEvolution._mapreduce","text":"Internal function. Helper for bfsmapreduce and dfsmapreduce\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.backward!-Tuple{DiscretePartition, DiscretePartition, MolecularEvolution.PMatrixModel, FelNode}","page":"Home","title":"MolecularEvolution.backward!","text":"backward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)\n\nPropagate the source partition backwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.bfs_mapreduce-Union{Tuple{T}, Tuple{AbstractTreeNode, T, Any}} where T<:Function","page":"Home","title":"MolecularEvolution.bfs_mapreduce","text":"Performs a BFS map-reduce over the tree, starting at a given node For each node, mapreduce is called as: mapreduce(currnode::FelNode, prevnode::FelNode, aggregator) where prev_node is the previous node visited on the path from the start node to the current node It is expected to update the aggregator, and not return anything.\n\nNot exactly conventional map-reduce, as map-reduce calls may rely on state in the aggregator added by map-reduce calls on other nodes visited earlier.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.branchlength_optim!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.branchlength_optim!","text":"branchlength_optim!(tree::FelNode, models; )\n\nUses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.\n\nKeyword Arguments\n\npartition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models, the default option).\ntol=1e-5: absolute tolerance for the bl_modifier.\nbl_modifier=GoldenSectionOpt(): can either be a optimizer or a sampler (subtype of UnivariateModifier). For optimization, in addition to golden section search, Brent's method can be used by setting bl_modifier=BrentsMethodOpt().\nsort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.\ntraversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.\nshuffle=false: do a randomly shuffled traversal, overrides traversal.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.brents_method_minimize-Tuple{Any, Real, Real, Any, Real}","page":"Home","title":"MolecularEvolution.brents_method_minimize","text":"brents_method_minimize(f, a::Real, b::Real, transform, t::Real; ε::Real=sqrt(eps()))\n\nBrent's method for minimization.\n\nGiven a function f with a single local minimum in the interval (a,b), Brent's method returns an approximation of the x-value that minimizes f to an accuaracy between 2tol and 3tol, where tol is a combination of a relative and an absolute tolerance, tol := ε|x| + t. ε should be no smaller 2*eps, and preferably not much less than sqrt(eps), which is also the default value. eps is defined here as the machine epsilon in double precision. t should be positive.\n\nThe method combines the stability of a Golden Section Search and the superlinear convergence Successive Parabolic Interpolation has under certain conditions. The method never converges much slower than a Fibonacci search and for a sufficiently well-behaved f, convergence can be exptected to be superlinear, with an order that's usually atleast 1.3247...\n\nExamples\n\njulia> f(x) = exp(-x) - cos(x)\nf (generic function with 1 method)\n\njulia> m = brents_method_minimize(f, -1, 2, identity, 1e-7)\n0.5885327257940255\n\nFrom: Richard P. Brent, \"Algorithms for Minimization without Derivatives\" (1973). Chapter 5.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.cascading_max_state_dict-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.cascading_max_state_dict","text":"cascading_max_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.char_proportions-Tuple{Any, String}","page":"Home","title":"MolecularEvolution.char_proportions","text":"char_proportions(seqs, alphabet::String)\n\nTakes a vector of sequences and returns a vector of the proportion of each character across all sequences. An example alphabet argument is MolecularEvolution.AAstring.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.collect_leaf_dists-Tuple{Vector{<:AbstractTreeNode}}","page":"Home","title":"MolecularEvolution.collect_leaf_dists","text":"collect_leaf_dists(trees::Vector{<:AbstractTreeNode})\n\nReturns a list of distance matrices containing the distance between the leaf nodes, which can be used to assess mixing.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.colored_seq_draw-Tuple{Any, Any, AbstractString}","page":"Home","title":"MolecularEvolution.colored_seq_draw","text":"colored_seq_draw(x, y, str::AbstractString; color_dict=Dict(), font_size=8pt, posx=hcenter, posy=vcenter)\n\nDraw an arbitrary sequence. color_dict gives a mapping from characters to colors (default black). Default options for nucleotide colorings and amino acid colorings are given in the constants NUC_COLORS and AA_COLORS. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.combine!-Tuple{DiscretePartition, DiscretePartition}","page":"Home","title":"MolecularEvolution.combine!","text":"combine!(dest::P, src::P) where P<:Partition\n\nCombines evidence from two partitions of the same type, storing the result in dest. Note: You should overload this for your own Partititon types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.copy_tree","page":"Home","title":"MolecularEvolution.copy_tree","text":"function copy_tree(root::FelNode, shallow_copy=false)\n\nReturns an untangled copy of the tree. Optionally, the flag `shallow_copy` can be used to obtain a copy of the tree with only the names and branchlengths.\n\n\n\n\n\n","category":"function"},{"location":"#MolecularEvolution.deepequals-Union{Tuple{T}, Tuple{T, T}} where T<:AbstractTreeNode","page":"Home","title":"MolecularEvolution.deepequals","text":"deepequals(t1, t2)\n\nChecks whether two trees are equal by recursively calling this on all fields, except :parent, in order to prevent cycles. In order to ensure that the :parent field is not hiding something different on both trees, ensure that each is consistent first (see: istreeconsistent).\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.dfs_mapreduce-Union{Tuple{T}, Tuple{AbstractTreeNode, T, Any}} where T<:Function","page":"Home","title":"MolecularEvolution.dfs_mapreduce","text":"Performs a DFS map-reduce over the tree, starting at a given node See bfs_mapreduce for more details.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.discrete_name_color_dict-Tuple{AbstractTreeNode, Any}","page":"Home","title":"MolecularEvolution.discrete_name_color_dict","text":"discrete_name_color_dict(newt::AbstractTreeNode,tag_func; rainbow = false, scramble = false, darken = true, col_seed = nothing)\n\nTakes a tree and a tag_func, which converts the leaf label into a category (ie. there should be <20 of these), and returns a color dictionary that can be used to color the leaves or bubbles.\n\nExample tagfunc: function tagfunc(nam::String) return split(nam,\"_\")[1] end\n\nFor prettier colors, but less discrimination: rainbow = true To randomize the rainbow color assignment: scramble = true col_seed is currently set to white, and excluded from the list of colors, to make them more visible.\n\nConsider making your own version of this function to customize colors as you see fit.\n\nExample use: numleaves = 50 Nefunc(t) = 1*(e^-t).+5.0 newt = simtree(numleaves,Nefunc,1.0,nstart = rand(1:numleaves)); newt = ladderize(newt) tagfunc(nam) = mod(sum(Int.(collect(nam))),7) dic = discretenamecolordict(newt,tagfunc,rainbow = true); treedraw(newt,linewidth = 0.5mm,labelcolor_dict = dic)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.draw_example_tree-Tuple{}","page":"Home","title":"MolecularEvolution.draw_example_tree","text":"draw_example_tree(num_leaves = 50)\n\nDraws a tree and shows the code that draws it.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.endpoint_conditioned_sample_state_dict-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.endpoint_conditioned_sample_state_dict","text":"endpoint_conditioned_sample_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.expected_subs_per_site-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.expected_subs_per_site","text":"expected_subs_per_site(Q,mu)\n\nTakes a rate matrix Q and an equilibrium frequency vector, and calculates the expected number of substitutions per site.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.felsenstein!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.felsenstein!","text":"felsenstein!(node::FelNode, models; partition_list = nothing)\n\nShould usually be called on the root of the tree. Propagates Felsenstein pass up from the tips to the root. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.felsenstein_down!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.felsenstein_down!","text":"felsenstein_down!(node::FelNode, models; partition_list = 1:length(tree.message), temp_message = copy_message(tree.message))\n\nShould usually be called on the root of the tree. Propagates Felsenstein pass down from the root to the tips. felsenstein!() should usually be called first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.forward!-Tuple{DiscretePartition, DiscretePartition, MolecularEvolution.PMatrixModel, FelNode}","page":"Home","title":"MolecularEvolution.forward!","text":"forward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)\n\nPropagate the source partition forwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.gappy_Q_from_symmetric_rate_matrix-Tuple{Any, Any, Any}","page":"Home","title":"MolecularEvolution.gappy_Q_from_symmetric_rate_matrix","text":"gappy_Q_from_symmetric_rate_matrix(sym_mat, gap_rate, eq_freqs)\n\nTakes a symmetric rate matrix and gap rate (governing mutations to and from gaps) and returns a gappy rate matrix. The equilibrium frequencies are multiplied on column-wise.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.get_highlighter_legend-Tuple{Any}","page":"Home","title":"MolecularEvolution.get_highlighter_legend","text":"get_highlighter_legend(legend_colors)\n\nReturns a Compose object given an input dictionary or pairs mapping characters to colors.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.get_max_depth-Tuple{Any, Real}","page":"Home","title":"MolecularEvolution.get_max_depth","text":"get_max_depth(node,depth::Real)\n\nReturn the maximum depth of all children starting from the indicated node.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.get_phylo_tree-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.get_phylo_tree","text":"get_phylo_tree(molev_root::FelNode; data_function = (x -> Tuple{String,Float64}[]))\n\nConverts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.golden_section_maximize-Tuple{Any, Real, Real, Any, Real}","page":"Home","title":"MolecularEvolution.golden_section_maximize","text":"Golden section search.\n\nGiven a function f with a single local minimum in the interval [a,b], gss returns a subset interval [c,d] that contains the minimum with d-c <= tol.\n\nExamples\n\njulia> f(x) = -(x-2)^2\nf (generic function with 1 method)\n\njulia> m = golden_section_maximize(f, 1, 5, identity, 1e-10)\n2.0000000000051843\n\nFrom: https://en.wikipedia.org/wiki/Golden-section_search\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.highlight_seq_draw-Tuple{Any, Any, AbstractString, Any, Any, Any}","page":"Home","title":"MolecularEvolution.highlight_seq_draw","text":"highlight_seq_draw(x, y, str::AbstractString, region, basecolor, hicolor; fontsize=8pt, posx=hcenter, posy=vcenter)\n\nDraw a sequence, highlighting the sites given in region. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.highlighter_tree_draw-NTuple{4, Any}","page":"Home","title":"MolecularEvolution.highlighter_tree_draw","text":"highlighter_tree_draw(tree, ali_seqs, seqnames, master;\n highlighter_start = 1.1, highlighter_width = 1,\n coord_width = highlighter_start + highlighter_width + 0.1,\n scale_length = nothing, major_breaks = 1000, minor_breaks = 500,\n tree_args = NamedTuple[], legend_padding = 0.5cm, legend_colors = NUC_colors)\n\nDraws a combined tree and highlighter plot. The vector of seqnames must match the node names in tree.\n\nkwargs:\n\ntreeargs: kwargs to pass to `treedraw()`\nlegendcolors: Mapping of characters to highlighter colors (default NTcolors)\nscale_length: Length of the scale bar\nhighlighter_start: Canvas start for the highlighter panel\nhighlighter_width: Canvas width for the highlighter panel\ncoord_width: Total width of the canvas\nmajor_breaks: Numbered breaks for sequence axis\nminor_breaks: Ticks for sequence axis\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.internal_message_init!-Tuple{FelNode, Partition}","page":"Home","title":"MolecularEvolution.internal_message_init!","text":"internal_message_init!(tree::FelNode, partition::Partition)\n\nInitializes the message template for each node in the tree, as an array of the partition.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.internal_message_init!-Tuple{FelNode, Vector{<:Partition}}","page":"Home","title":"MolecularEvolution.internal_message_init!","text":"internal_message_init!(tree::FelNode, empty_message::Vector{<:Partition})\n\nInitializes the message template for each node in the tree, allocating space for each partition.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.istreeconsistent-Tuple{T} where T<:AbstractTreeNode","page":"Home","title":"MolecularEvolution.istreeconsistent","text":"istreeconsistent(root)\n\nChecks whether the :parent field is set to be consistent with the :child field for all nodes in the subtree. \n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.lazyprep!-Tuple{FelNode, Vector{<:Partition}}","page":"Home","title":"MolecularEvolution.lazyprep!","text":"lazyprep!(tree::FelNode, initial_message::Vector{<:Partition}; partition_list = 1:length(tree.message), direction::LazyDirection = LazyUp())\n\nExtra, intermediate step of tree preparations between initializing messages across the tree and calling message passing algorithms with LazyPartition.\n\nPerform a lazysort! on tree to obtain the optimal tree for a lazy felsenstein! prop, or a sample_down!.\nFix tree.parent_message to an initial message.\nPreallocate sufficiently many inner partitions needed for a felsenstein! prop, or a sample_down!.\nSpecialized preparations based on the direction of the operations (forward!, backward!). LazyDown or LazyUp.\n\nSee also LazyDown, LazyUp.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.lazysort!-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.lazysort!","text":"Should be run on a tree containing LazyPartitions before running felsenstein!. Sorts for a minimal count of active partitions during a felsenstein!\nReturns the minimum length of memoryblocks (-1) required for a felsenstein! prop. We need a temporary memoryblock during backward!, hence the '-1'.\n\nnote: Note\nSince felsenstein! uses a stack, we want to avoid having long node.children[1].children[1]... chains\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.leaf_distmat-Tuple{Any}","page":"Home","title":"MolecularEvolution.leaf_distmat","text":"leaf_distmat(tree)\n\nReturns a matrix of the distances between the leaf nodes where the index on the columns and rows are sorted by the leaf names.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.linear_scale-NTuple{5, Any}","page":"Home","title":"MolecularEvolution.linear_scale","text":"linear_scale(val,in_min,in_max,out_min,out_max)\n\nLinearly maps val which lives in [inmin,inmax] to a value in [outmin,outmax]\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.log_likelihood!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.log_likelihood!","text":"log_likelihood!(tree::FelNode, models; partition_list = nothing)\n\nFirst re-computes the upward felsenstein pass, and then computes the log likelihood of this tree. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.log_likelihood-Tuple{FelNode, BranchModel}","page":"Home","title":"MolecularEvolution.log_likelihood","text":"log_likelihood(tree::FelNode, models; partition_list = nothing)\n\nComputed the log likelihood of this tree. Requires felsenstein!() to have been run. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.longest_path-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.longest_path","text":"Returns the longest path in a tree For convenience, this is returned as two lists of form: [leafnode, parentnode, .... root] Where the leaf_node nodes are selected to be the furthest away\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.marginal_state_dict-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.marginal_state_dict","text":"marginal_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.matrix_for_display-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.matrix_for_display","text":"matrix_for_display(Q,labels)\n\nTakes a numerical matrix and a vector of labels, and returns a typically mixed type matrix with the numerical values and the labels. This is to easily visualize rate matrices in eg. the REPL.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.metropolis_sample-Tuple{FelNode, Vector{<:BranchModel}, Any}","page":"Home","title":"MolecularEvolution.metropolis_sample","text":"function metropolis_sample(\n initial_tree::FelNode,\n models::Vector{<:BranchModel},\n num_of_samples;\n bl_modifier::UnivariateSampler = BranchlengthSampler(Normal(0,2), Normal(-1,1))\n burn_in=1000, \n sample_interval=10,\n collect_LLs = false,\n midpoint_rooting=false,\n)\n\nSamples tree topologies from a posterior distribution. \n\nArguments\n\ninitial_tree: An initial tree topology with the leaves populated with data, for the likelihood calculation.\nmodels: A list of branch models.\nnum_of_samples: The number of tree samples drawn from the posterior.\nbl_sampler: Sampler used to drawn branchlengths from the posterior. \nburn_in: The number of samples discarded at the start of the Markov Chain.\nsample_interval: The distance between samples in the underlying Markov Chain (to reduce sample correlation).\ncollect_LLs: Specifies if the function should return the log-likelihoods of the trees.\nmidpoint_rooting: Specifies whether the drawn samples should be midpoint rerooted (Important! Should only be used for time-reversible branch models starting in equilibrium).\n\nnote: Note\nThe leaves of the initial tree should be populated with data and felsenstein! should be called on the initial tree before calling this function.\n\nReturns\n\nsamples: The trees drawn from the posterior. Returns shallow tree copies, which needs to be repopulated before running felsenstein! etc. \nsample_LLs: The associated log-likelihoods of the tree (optional).\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.midpoint-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.midpoint","text":"Returns a midpoint as a node and a distance above it where the midpoint is\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.mix-Union{Tuple{SWMPartition{PType}}, Tuple{PType}} where PType<:DiscretePartition","page":"Home","title":"MolecularEvolution.mix","text":"mix(swm_part::SWMPartition{PType} ) where {PType <: MultiSitePartition}\n\nmix collapses a Site-Wise Mixture partition to a single component partition, weighted by the site-wise likelihoods for each component, and the init weights. Specifically, it takes a SWMPartition{Ptype} and returns a PType. You'll need to have this implemented for certain helper functionality if you're playing with new kinds of SWMPartitions that aren't mixtures of DiscretePartitions.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.name2node_dict-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.name2node_dict","text":"name2node_dict(root)\n\nReturns a dictionary of leaf nodes, indexed by node.name. Can be used to associate sequences with leaf nodes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.newick-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.newick","text":"newick(root)\n\nReturns a newick string representation of the tree.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.nni_optim!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.nni_optim!","text":"nni_optim!(tree::FelNode, models; )\n\nConsiders local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.\n\nKeyword Arguments\n\npartition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models, the default option).\nselection_rule = x -> argmax(x): a function that takes the current and proposed log likelihoods and selects a nni configuration. Note that the current log likelihood is stored at x[1].\nsort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.\ntraversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.\nshuffle=false: do a randomly shuffled traversal, overrides traversal.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.node_distances-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.node_distances","text":"Compute the distance to all other nodes from a given node\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.nonreversibleQ-Tuple{Any}","page":"Home","title":"MolecularEvolution.nonreversibleQ","text":"nonreversibleQ(param_vec)\n\nTakes a vector of parameters and returns a nonreversible rate matrix.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.parent_list-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.parent_list","text":"Provides a list of parent nodes nodes from this node up to the root node\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.partition2obs-Tuple{DiscretePartition, String}","page":"Home","title":"MolecularEvolution.partition2obs","text":"partition2obs(part::Partition)\n\nExtracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.populate_tree!-Tuple{FelNode, Partition, Any, Any}","page":"Home","title":"MolecularEvolution.populate_tree!","text":"populate_tree!(tree::FelNode, starting_message, names, data; init_all_messages = true, tolerate_missing = 1, leaf_name_transform = x -> x)\n\nTakes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if\n\ntolerate_missing = 0, an error will be thrown\ntolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)\ntolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)\n\nA renaming function that can eg. strip tags from the tree when matching leaf names with names can be passed to leaf_name_transform\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.promote_internal-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.promote_internal","text":"promote_internal(tree::FelNode)\n\nCreates a new tree similar to the given tree, but with 'dummy' leaf nodes (w/ zero branchlength) representing each internal node (for drawing / evenly spacing labels internal nodes).\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.quadratic_CI-Tuple{Function, Vector, Int64}","page":"Home","title":"MolecularEvolution.quadratic_CI","text":"quadratic_CI(f::Function,opt_params::Vector, param_ind::Int; rate_conf_level = 0.99, nudge_amount = 0.01)\n\nTakes a NEGATIVE log likelihood function (compatible with Optim.jl), a vector of maximizing parameters, an a parameter index. Returns the quadratic confidence interval.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.quadratic_CI-Tuple{Vector, Vector}","page":"Home","title":"MolecularEvolution.quadratic_CI","text":"quadratic_CI(xvec,yvec; rate_conf_level = 0.99)\n\nTakes xvec, a vector of parameter values, and yvec, a vector of log likelihood evaluations (note: NOT the negative LLs you) might use with Optim.jl. Returns the confidence intervals computed by a quadratic approximation to the LL.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.read_fasta-Tuple{String}","page":"Home","title":"MolecularEvolution.read_fasta","text":"read_fasta(filepath::String)\n\nReads in a fasta file and returns a tuple of (seqnames, seqs).\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.read_newick_tree-Tuple{String}","page":"Home","title":"MolecularEvolution.read_newick_tree","text":"readnewicktree(treefile)\n\nReads in a tree from a file, of type FelNode\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.reversibleQ-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.reversibleQ","text":"reversibleQ(param_vec,eq_freqs)\n\nTakes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.root2tip_distances-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.root2tip_distances","text":"root2tips(root::AbstractTreeNode)\n\nReturns a vector of root-to-tip distances, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sample_down!-Tuple{FelNode, Any, Any}","page":"Home","title":"MolecularEvolution.sample_down!","text":"sampledown!(root::FelNode,models,partitionlist)\n\nGenerates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sample_from_message!-Tuple{Vector{<:Partition}}","page":"Home","title":"MolecularEvolution.sample_from_message!","text":"sample_from_message!(message::Vector{<:Partition})\n\n#Replaces an uncertain message with a sample from the distribution represented by each partition.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.savefig_tweakSVG-Tuple{Any, Context}","page":"Home","title":"MolecularEvolution.savefig_tweakSVG","text":"savefig_tweakSVG(fname, plot::Context; width = 10cm, height = 10cm, linecap_round = true, white_background = true)\n\nSaves a figure created using the Compose approach, but tweaks the SVG after export.\n\neg. savefig_tweakSVG(\"export.svg\",pl)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.savefig_tweakSVG-Tuple{Any, Plots.Plot}","page":"Home","title":"MolecularEvolution.savefig_tweakSVG","text":"savefig_tweakSVG(fname, plot::Plots.Plot; hack_bounding_box = true, new_viewbox = nothing, linecap_round = true)\n\nNote: Might only work if you're using the GR backend!! Saves a figure created using the Phylo Plots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.\n\neg. savefig_tweakSVG(\"export.svg\",pl, new_viewbox = [-100, -100, 3000, 4500])\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.shortest_path_between_nodes-Tuple{FelNode, FelNode}","page":"Home","title":"MolecularEvolution.shortest_path_between_nodes","text":"Shortest path between nodes, returned as two lists, each starting with one of the two nodes, and ending with the common ancestor\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sibling_inds-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.sibling_inds","text":"sibling_inds(node)\n\nReturns logical indices of the siblings in the parent's child's vector.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.siblings-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.siblings","text":"siblings(node)\n\nReturns a vector of siblings of node.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sim_tree-Tuple{Int64, Any, Any}","page":"Home","title":"MolecularEvolution.sim_tree","text":"sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)\n\nSimulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.\n\nNefunc(t) = (sin(t/10)+1)*100.0 + 10.0 root = simtree(600,Nefunc,1.0) simpletree_draw(ladderize(root))\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sim_tree-Tuple{}","page":"Home","title":"MolecularEvolution.sim_tree","text":"sim_tree(;n = 10)\n\nSimulates tree with constant population size.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.simple_radial_tree_plot-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.simple_radial_tree_plot","text":"simple_radial_tree_plot(root::FelNode; canvas_width = 10cm, line_color = \"black\", line_width = 0.1mm)\n\nDraws a radial tree. No frills. No labels. Canvas height is automatically determined to avoid distorting the tree.\n\nnewt = betternewickimport(\"((A:1,B:1,C:1,D:1,E:1,F:1,G:1):1,(H:1,I:1):1);\", FelNode{Float64}); simpleradialtreeplot(newt,linewidth = 0.5mm,root_angle = 7/10)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.simple_tree_draw-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.simple_tree_draw","text":"img = simpletreedraw(tree::FelNode; canvaswidth = 15cm, canvasheight = 15cm, linecolor = \"black\", linewidth = 0.1mm)\n\nA line drawing of a tree with very few options.\n\nimg = simple_tree_draw(tree)\nimg |> SVG(\"imgout.svg\",10cm, 10cm)\nOR\nusing Cairo\nimg |> PDF(\"imgout.pdf\",10cm, 10cm)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.total_LL-Tuple{Partition}","page":"Home","title":"MolecularEvolution.total_LL","text":"total_LL(p::Partition)\n\nIf called on the root, it returns the log likelihood associated with that partition. Can be overloaded for complex partitions without straightforward site log likelihoods.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree2distances-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.tree2distances","text":"tree2distances(root::AbstractTreeNode)\n\nReturns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree2shared_branch_lengths-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.tree2shared_branch_lengths","text":"tree2distances(root::AbstractTreeNode)\n\nReturns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree_draw-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.tree_draw","text":"tree_draw(tree::FelNode;\n canvas_width = 15cm, canvas_height = 15cm,\n stretch_for_labels = 2.0, draw_labels = true,\n line_width = 0.1mm, font_size = 4pt,\n min_dot_size = 0.00, max_dot_size = 0.01,\n line_opacity = 1.0,\n dot_opacity = 1.0,\n name_opacity = 1.0,\n horizontal = true,\n dot_size_dict = Dict(), dot_size_default = 0.0,\n dot_color_dict = Dict(), dot_color_default = \"black\",\n line_color_dict = Dict(), line_color_default = \"black\",\n label_color_dict = Dict(), label_color_default = \"black\",\n nodelabel_dict = Dict(),compose_dict = Dict()\n )\n\nDraws a tree with a number of self-explanatory options. Dictionaries that map a node to a color/size are used to control per-node plotting options. compose_dict must be a FelNode->function(x,y) dictionary that returns a compose() struct.\n\nExample using compose_dict\n\nstr_tree = \"(((((tax24:0.09731668728575642,(tax22:0.08792233964843627,tax18:0.9210388482867483):0.3200367900275155):0.6948314526087965,(tax13:1.9977212308725611,(tax15:0.4290074347886068,(tax17:0.32928401808187824,(tax12:0.3860215462534818,tax16:0.2197134841232339):0.1399122681886174):0.05744611946245004):1.4686085778061146):0.20724159879522402):0.4539334554156126,tax28:0.4885576926440158):0.002162260013924424,tax26:0.9451873777301325):3.8695419798779387,((tax29:0.10062813251515536,tax27:0.27653633028085006):0.04262434258357507,(tax25:0.009345653929737636,((tax23:0.015832941547076644,(tax20:0.5550597590956172,((tax8:0.6649025646927402,tax9:0.358506423199849):0.1439516404012261,tax11:0.01995439013213013):1.155181296134081):0.17930021667907567):0.10906638146207207,((((((tax6:0.013708993438720255,tax5:0.061144001556547097):0.1395453591567641,tax3:0.4713722705245479):0.07432598428904214,tax1:0.5993347898257291):1.0588025698844894,(tax10:0.13109032492533992,(tax4:0.8517302241963356,(tax2:0.8481963081549965,tax7:0.23754095940676642):0.2394313086297733):0.43596704123297675):0.08774657269409454):0.9345533723114966,(tax14:0.7089558245245173,tax19:0.444897137240675):0.08657675809803095):0.01632062723968511,tax21:0.029535281963725537):0.49502691718938285):0.25829576024240986):0.7339777396780424):4.148878039524972):0.0\"\nnewt = gettreefromnewick(str_tree, FelNode)\nladderize!(newt)\ncompose_dict = Dict()\nfor n in getleaflist(newt)\n #Replace the rand(4) with the frequencies you actually want.\n compose_dict[n] = (x,y)->pie_chart(x,y,MolecularEvolution.sum2one(rand(4)),size = 0.03)\nend\ntree_draw(newt,draw_labels = false,line_width = 0.5mm, compose_dict = compose_dict)\n\n\nimg = tree_draw(tree)\nimg |> SVG(\"imgout.svg\",10cm, 10cm)\nOR\nusing Cairo\nimg |> PDF(\"imgout.pdf\",10cm, 10cm)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree_polish!-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.tree_polish!","text":"tree_polish!(newt, models; tol = 10^-4, verbose = 1, topology = true)\n\nTakes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.unc2probvec-Tuple{Any}","page":"Home","title":"MolecularEvolution.unc2probvec","text":"unc2probvec(v)\n\nTakes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.univariate_maximize-Tuple{Any, Real, Real, Any, BrentsMethodOpt, Real}","page":"Home","title":"MolecularEvolution.univariate_maximize","text":"univariate_maximize(f, a::Real, b::Real, transform, optimizer::BrentsMethodOpt, t::Real; ε::Real=sqrt(eps))\n\nMaximizes f(x) using Brent's method. See ?brents_method_minimize.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.univariate_maximize-Tuple{Any, Real, Real, Any, GoldenSectionOpt, Real}","page":"Home","title":"MolecularEvolution.univariate_maximize","text":"univariate_maximize(f, a::Real, b::Real, transform, optimizer::GoldenSectionOpt, tol::Real)\n\nMaximizes f(x) using a Golden Section Search. See ?golden_section_maximize.\n\nExamples\n\njulia> f(x) = -(x-2)^2\nf (generic function with 1 method)\n\njulia> m = univariate_maximize(f, 1, 5, identity, GoldenSectionOpt(), 1e-10)\n2.0000000000051843\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.univariate_sampler-Tuple{Any, BranchlengthSampler, Any}","page":"Home","title":"MolecularEvolution.univariate_sampler","text":"univariate_sampler(LL, modifier::BranchlengthPeturbation, curr_branchlength)\n\nA MCMC algorithm that draws the next sample of a Markov Chain that approximates the Posterior distrubution over the branchlengths.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.values_from_phylo_tree-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.values_from_phylo_tree","text":"values_from_phylo_tree(phylo_tree, key)\n\nReturns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.weightEM-Tuple{Matrix{Float64}, Any}","page":"Home","title":"MolecularEvolution.weightEM","text":"weightEM(con_lik_matrix::Array{Float64,2}, θ; conc = 0.0, iters = 500)\n\nTakes a conditional likelihood matrix (#categories-by-sites) and a starting frequency vector θ (length(θ) = #categories) and optimizes θ (using Expectation Maximization. Maybe.). If conc > 0 then this gives something like variational bayes behavior for LDA. Maybe.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.write_fasta-Tuple{String, Vector{String}}","page":"Home","title":"MolecularEvolution.write_fasta","text":"write_fasta(filepath::String, sequences::Vector{String}; seq_names = nothing)\n\nWrites a fasta file from a vector of sequences, with optional seq_names.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.write_nexus-Tuple{String, FelNode}","page":"Home","title":"MolecularEvolution.write_nexus","text":"write_nexus(fname::String,tree::FelNode)\n\nWrites the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.\n\n\n\n\n\n","category":"method"},{"location":"IO/#Input/Output","page":"Input/Output","title":"Input/Output","text":"","category":"section"},{"location":"IO/","page":"Input/Output","title":"Input/Output","text":"write_nexus\nnewick\nread_newick_tree\npopulate_tree!\nread_fasta\nwrite_fasta","category":"page"},{"location":"IO/#MolecularEvolution.write_nexus","page":"Input/Output","title":"MolecularEvolution.write_nexus","text":"write_nexus(fname::String,tree::FelNode)\n\nWrites the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.newick","page":"Input/Output","title":"MolecularEvolution.newick","text":"newick(root)\n\nReturns a newick string representation of the tree.\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.read_newick_tree","page":"Input/Output","title":"MolecularEvolution.read_newick_tree","text":"readnewicktree(treefile)\n\nReads in a tree from a file, of type FelNode\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.populate_tree!","page":"Input/Output","title":"MolecularEvolution.populate_tree!","text":"populate_tree!(tree::FelNode, starting_message, names, data; init_all_messages = true, tolerate_missing = 1, leaf_name_transform = x -> x)\n\nTakes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if\n\ntolerate_missing = 0, an error will be thrown\ntolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)\ntolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)\n\nA renaming function that can eg. strip tags from the tree when matching leaf names with names can be passed to leaf_name_transform\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.read_fasta","page":"Input/Output","title":"MolecularEvolution.read_fasta","text":"read_fasta(filepath::String)\n\nReads in a fasta file and returns a tuple of (seqnames, seqs).\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.write_fasta","page":"Input/Output","title":"MolecularEvolution.write_fasta","text":"write_fasta(filepath::String, sequences::Vector{String}; seq_names = nothing)\n\nWrites a fasta file from a vector of sequences, with optional seq_names.\n\n\n\n\n\n","category":"function"}]
+[{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"EditURL = \"../../../examples/viz.jl\"","category":"page"},{"location":"generated/viz/#Visualization","page":"Visualization","title":"Visualization","text":"","category":"section"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"We offer two routes to visualization. The first is using our own plotting routines, built atop Compose.jl. The second converts our trees to Phylo.jl trees, and plots with their Plots.jl recipes. The Compose, Plots, and Phylo dependencies are optional.","category":"page"},{"location":"generated/viz/#Example-1","page":"Visualization","title":"Example 1","text":"","category":"section"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"using MolecularEvolution, Plots, Phylo\n\n#First simulate a tree, and then Brownian motion:\ntree = sim_tree(n = 20)\ninternal_message_init!(tree, GaussianPartition())\nbm_model = BrownianMotion(0.0, 0.1)\nsample_down!(tree, bm_model)\n\n#We'll add the Gaussian means to the node_data dictionaries\nfor n in getnodelist(tree)\n n.node_data = Dict([\"mu\" => n.message[1].mean])\nend\n\n#Transducing the mol ev tree to a Phylo.jl tree\nphylo_tree = get_phylo_tree(tree)\n\npl = plot(\n phylo_tree,\n showtips = true,\n tipfont = 6,\n marker_z = \"mu\",\n markeralpha = 0.5,\n line_z = \"mu\",\n linecolor = :darkrainbow,\n markersize = 4.0,\n markerstrokewidth = 0,\n margins = 1Plots.cm,\n linewidth = 1.5,\n markercolor = :darkrainbow,\n size = (500, 500),\n)","category":"page"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"We also offer savefig_tweakSVG(\"simple_plot_example.svg\", pl) for some post-processing tricks that improve the exported trees, like rounding line caps, and values_from_phylo_tree(phylo_tree,\"mu\") which can extract stored quantities in the right order for passing into eg. markersize options when plotting.","category":"page"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"For a more comprehensive list of things you can do with Phylo.jl plots, please see their documentation.","category":"page"},{"location":"generated/viz/#Drawing-trees-with-Compose.jl.","page":"Visualization","title":"Drawing trees with Compose.jl.","text":"","category":"section"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"The Compose.jl in-house tree drawing offers extensive flexibility. Here is an example that plots a pie chart representing the marginal probability of each of the 4 possible nucleotides on all nodes on the tree:","category":"page"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"using MolecularEvolution, Compose\n\ntree = sim_tree(40, 1000.0, 0.005, mutation_rate = 0.001)\nmodel = DiagonalizedCTMC(reversibleQ(ones(6), ones(4) ./ 4))\ninternal_message_init!(tree, NucleotidePartition(ones(4) ./ 4, 1))\nsample_down!(tree, model)\nd = marginal_state_dict(tree, model);\nnothing #hide","category":"page"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"compose_dict = Dict()\nfor n in getnodelist(tree)\n compose_dict[n] =\n (x, y) -> pie_chart(x, y, d[n][1].state[:, 1], size = 0.02, opacity = 0.75)\nend\nimg = tree_draw(tree,draw_labels = false, line_width = 0.5mm, compose_dict = compose_dict)","category":"page"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"This can then be exported with:","category":"page"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"savefig_tweakSVG(\"piechart_tree.svg\",img);\nnothing #hide","category":"page"},{"location":"generated/viz/#Multiple-trees","page":"Visualization","title":"Multiple trees","text":"","category":"section"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"Doesn't require Phylo.jl. Query trees can be plotted against a reference tree with plot_multiple_trees. This can be useful, for instance, when we've sampled trees with metropolis_sample.","category":"page"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"using MolecularEvolution, Plots\n\ntree = sim_tree(10, 1, 1)\nnodelist = getnodelist(tree); mean = sum([n.branchlength for n in nodelist]) / length(nodelist)\nrparams(n::Int) = MolecularEvolution.sum2one(rand(n))\nmodel = DiagonalizedCTMC(reversibleQ(ones(6) ./ (6 * mean), rparams(4)))\ninternal_message_init!(tree, NucleotidePartition(ones(4) ./ 4, 100))\nsample_down!(tree, model)\n@time trees, LLs = metropolis_sample(tree, [model], 300, collect_LLs=true);\nreference = trees[argmax(LLs)];\nnothing #hide","category":"page"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"We'll use the maximum a posteriori tree as reference","category":"page"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"plot_multiple_trees(trees, reference)","category":"page"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"We can pass in a weight function to fit query trees against reference in a weighted least squares fashion with a location and scale parameter.","category":"page"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"note: Note\nIf we don't want to scale the query trees, we must disable it with opt_scale = false.","category":"page"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"plot_multiple_trees(\n trees,\n reference,\n y_jitter = 0.05,\n weight_fn = n::FelNode ->\n ifelse(MolecularEvolution.isroot(n) || isleafnode(n), 1.0, 0.0)\n)","category":"page"},{"location":"generated/viz/#Functions","page":"Visualization","title":"Functions","text":"","category":"section"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"get_phylo_tree\nvalues_from_phylo_tree\nsavefig_tweakSVG\ntree_draw\nplot_multiple_trees","category":"page"},{"location":"generated/viz/#MolecularEvolution.get_phylo_tree","page":"Visualization","title":"MolecularEvolution.get_phylo_tree","text":"get_phylo_tree(molev_root::FelNode; data_function = (x -> Tuple{String,Float64}[]))\n\nConverts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.\n\n\n\n\n\n","category":"function"},{"location":"generated/viz/#MolecularEvolution.values_from_phylo_tree","page":"Visualization","title":"MolecularEvolution.values_from_phylo_tree","text":"values_from_phylo_tree(phylo_tree, key)\n\nReturns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.\n\n\n\n\n\n","category":"function"},{"location":"generated/viz/#MolecularEvolution.savefig_tweakSVG","page":"Visualization","title":"MolecularEvolution.savefig_tweakSVG","text":"savefig_tweakSVG(fname, plot::Plots.Plot; hack_bounding_box = true, new_viewbox = nothing, linecap_round = true)\n\nNote: Might only work if you're using the GR backend!! Saves a figure created using the Phylo Plots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.\n\neg. savefig_tweakSVG(\"export.svg\",pl, new_viewbox = [-100, -100, 3000, 4500])\n\n\n\n\n\nsavefig_tweakSVG(fname, plot::Context; width = 10cm, height = 10cm, linecap_round = true, white_background = true)\n\nSaves a figure created using the Compose approach, but tweaks the SVG after export.\n\neg. savefig_tweakSVG(\"export.svg\",pl)\n\n\n\n\n\n","category":"function"},{"location":"generated/viz/#MolecularEvolution.tree_draw","page":"Visualization","title":"MolecularEvolution.tree_draw","text":"tree_draw(tree::FelNode;\n canvas_width = 15cm, canvas_height = 15cm,\n stretch_for_labels = 2.0, draw_labels = true,\n line_width = 0.1mm, font_size = 4pt,\n min_dot_size = 0.00, max_dot_size = 0.01,\n line_opacity = 1.0,\n dot_opacity = 1.0,\n name_opacity = 1.0,\n horizontal = true,\n dot_size_dict = Dict(), dot_size_default = 0.0,\n dot_color_dict = Dict(), dot_color_default = \"black\",\n line_color_dict = Dict(), line_color_default = \"black\",\n label_color_dict = Dict(), label_color_default = \"black\",\n nodelabel_dict = Dict(),compose_dict = Dict()\n )\n\nDraws a tree with a number of self-explanatory options. Dictionaries that map a node to a color/size are used to control per-node plotting options. compose_dict must be a FelNode->function(x,y) dictionary that returns a compose() struct.\n\nExample using compose_dict\n\nstr_tree = \"(((((tax24:0.09731668728575642,(tax22:0.08792233964843627,tax18:0.9210388482867483):0.3200367900275155):0.6948314526087965,(tax13:1.9977212308725611,(tax15:0.4290074347886068,(tax17:0.32928401808187824,(tax12:0.3860215462534818,tax16:0.2197134841232339):0.1399122681886174):0.05744611946245004):1.4686085778061146):0.20724159879522402):0.4539334554156126,tax28:0.4885576926440158):0.002162260013924424,tax26:0.9451873777301325):3.8695419798779387,((tax29:0.10062813251515536,tax27:0.27653633028085006):0.04262434258357507,(tax25:0.009345653929737636,((tax23:0.015832941547076644,(tax20:0.5550597590956172,((tax8:0.6649025646927402,tax9:0.358506423199849):0.1439516404012261,tax11:0.01995439013213013):1.155181296134081):0.17930021667907567):0.10906638146207207,((((((tax6:0.013708993438720255,tax5:0.061144001556547097):0.1395453591567641,tax3:0.4713722705245479):0.07432598428904214,tax1:0.5993347898257291):1.0588025698844894,(tax10:0.13109032492533992,(tax4:0.8517302241963356,(tax2:0.8481963081549965,tax7:0.23754095940676642):0.2394313086297733):0.43596704123297675):0.08774657269409454):0.9345533723114966,(tax14:0.7089558245245173,tax19:0.444897137240675):0.08657675809803095):0.01632062723968511,tax21:0.029535281963725537):0.49502691718938285):0.25829576024240986):0.7339777396780424):4.148878039524972):0.0\"\nnewt = gettreefromnewick(str_tree, FelNode)\nladderize!(newt)\ncompose_dict = Dict()\nfor n in getleaflist(newt)\n #Replace the rand(4) with the frequencies you actually want.\n compose_dict[n] = (x,y)->pie_chart(x,y,MolecularEvolution.sum2one(rand(4)),size = 0.03)\nend\ntree_draw(newt,draw_labels = false,line_width = 0.5mm, compose_dict = compose_dict)\n\n\nimg = tree_draw(tree)\nimg |> SVG(\"imgout.svg\",10cm, 10cm)\nOR\nusing Cairo\nimg |> PDF(\"imgout.pdf\",10cm, 10cm)\n\n\n\n\n\n","category":"function"},{"location":"generated/viz/#MolecularEvolution.plot_multiple_trees","page":"Visualization","title":"MolecularEvolution.plot_multiple_trees","text":"plot_multiple_trees(trees, inf_tree; )\n\nPlots multiple phylogenetic trees against a reference tree, inf_tree. For each tree in trees, a linear Weighted Least Squares (WLS) problem (parameterized by the weight_fn keyword) is solved for the x-positions of the matching nodes between inf_tree and tree.\n\nKeyword Arguments\n\nnode_size=4: the size of the nodes in the plot.\nline_width=0.5: the width of the branches from trees.\nfont_size=10: the font size for the leaf labels.\nmargin=1.5: the margin between a leaf node and its label.\nline_alpha=0.05: the transparency level of the branches from trees.\ny_jitter=0.0: the standard deviation of the noise in the y-coordinate.\nweight_fn=n::FelNode -> ifelse(isroot(n), 1.0, 0.0)): a function that assigns a weight to a node for the WLS problem.\nopt_scale=true: whether to include a scaling parameter for the WLS problem.\n\n\n\n\n\n","category":"function"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"","category":"page"},{"location":"generated/viz/","page":"Visualization","title":"Visualization","text":"This page was generated using Literate.jl.","category":"page"},{"location":"optimization/#Optimization","page":"Optimization","title":"Optimization","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"There are two distinct kinds of optimization: \"global\" model parameters, and then tree branchlengths and topology. These are kept distinct because we can use algorithmic tricks to dramatically improve the performance of the latter.","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"The example below will set up and optimize a \"Generalized Time Reversible\" nucleotide substitution model, where there are 6 rate parameters that govern the symmetric part of a rate matrix, and 4 nucleotide frequencies (that sum to 1, so only 3 underlying parameters).","category":"page"},{"location":"optimization/#Optimizing-model-parameters","page":"Optimization","title":"Optimizing model parameters","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"We first need to construct an objective function. A very common use case involves parameterizing a rate matrix (along with all the constraints this entails) from a flat parameter vector. reversibleQ can be convenient here, which takes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle (excluding the diagonal) of the rate matrix:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"using MolecularEvolution #hide\nreversibleQ(1:6,ones(4))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"...and the equilibrium frequencies are multiplied column-wise:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"reversibleQ(ones(6),[0.1,0.2,0.3,0.4])","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Another convenient trick is to be able to parameterize a vector of positive frequencies that sum to 1, using N-1 unconstrained parameters. unc2probvec can help:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"unc2probvec(zeros(3))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"ParameterHandling.jl provides a convenient framework for managing collections of parameters in a way that plays with much of the Julia optimization ecosystem, and we recommend its use. Here we'll use ParameterHandling and NLopt.","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"First, we'll load in some example nucleotide data:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"using MolecularEvolution, FASTX, ParameterHandling, NLopt\n\n#Read in seqs and tree, and populate the three NucleotidePartitions\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\ninitial_partition = NucleotidePartition(length(seqs[1]))\npopulate_tree!(tree,initial_partition,seqnames,seqs)","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Then we set up the model parameters, and the objective function:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"#Named tuple of parameters, with initial values and constraints (from ParameterHandling.jl)\ninitial_params = (\n rates=positive(ones(6)), #rates must be non-negative\n pi=zeros(3) #will be transformed into 4 eq freqs\n)\nflat_initial_params, unflatten = value_flatten(initial_params) #See ParameterHandling.jl docs\nnum_params = length(flat_initial_params)\n\n#Set up a function that builds a model from these parameters\nfunction build_model_vec(params)\n pi = unc2probvec(params.pi)\n return DiagonalizedCTMC(reversibleQ(params.rates,pi))\nend\n\n#Set up the function to be *minimized*\nfunction objective(params::NamedTuple; tree = tree)\n #In this example, we are optimizing the nuc equilibrium freqs\n #We'll also assume that the starting frequencies (at the root of the tree) are the eq freqs\n tree.parent_message[1].state .= unc2probvec(params.pi)\n return -log_likelihood!(tree,build_model_vec(params)) #Note, negative of LL, because minimization\nend","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Then we'll set up an optimizer from NLOpt. See this discussion and this exploration of optimizers.","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"opt = Opt(:LN_BOBYQA, num_params)\n#Note: NLopt requires a function that returns a gradient, even for gradient free methods, hence (x,y)->...\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x)) #See ParameterHandling.jl docs for objective ∘ unflatten explanation\n#Some bounds (which will be in the transformed domain) to prevent searching numerically silly bits of parameter space:\nlower_bounds!(opt, [-10.0 for i in 1:num_params])\nupper_bounds!(opt, [10.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\n_,mini,_ = NLopt.optimize(opt, flat_initial_params)\nfinal_params = unflatten(mini)\n\noptimized_model = build_model_vec(final_params)\nprintln(\"Opt LL:\",log_likelihood!(tree,optimized_model))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Opt LL:-3783.226756522292","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"We can view the optimized parameter values:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"println(\"Rates: \", round.(final_params.rates,sigdigits = 4))\nprintln(\"Pi:\", round.(unc2probvec(final_params.pi),sigdigits = 4))","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Rates: [1.124, 2.102, 1.075, 0.9802, 1.605, 0.5536]\nPi:[0.2796, 0.2192, 0.235, 0.2662]","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Or the entire optimized rate matrix:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"matrix_for_display(optimized_model.Q,['A','C','G','T'])","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"Opt LL:-3783.226756522292\n5×5 Matrix{Any}:\n \"\" 'A' 'C' 'G' 'T'\n 'A' -1.02672 0.246386 0.494024 0.286309\n 'C' 0.314289 -0.971998 0.23034 0.427368\n 'G' 0.587774 0.214842 -0.950007 0.147391\n 'T' 0.300663 0.35183 0.130093 -0.782586","category":"page"},{"location":"optimization/#Optimizing-the-tree-topology-and-branch-lengths","page":"Optimization","title":"Optimizing the tree topology and branch lengths","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"With a tree and a model, we can also optimize the branch lengths and search, by nearest neighbour interchange for changes to the tree that improve the likelihood. Individually, these are performed by nni_optim! and branchlength_optim!, which need to have felsenstein! and felsenstein_down! called beforehand, but this is all bundled into:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"tree_polish!(tree, optimized_model)","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"LL: -3783.226756522292\nLL: -3782.345818028071\nLL: -3782.3231632207567\nLL: -3782.3211724011044\nLL: -3782.321068684831\nLL: -3782.3210622627776","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"And just to convince you this works, we can perturb the branch lengths, and see how the likelihood improves:","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"for n in getnodelist(tree)\n n.branchlength *= (rand()+0.5)\nend\ntree_polish!(tree, optimzed_model)","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"LL: -3805.4140940138795\nLL: -3782.884883999107\nLL: -3782.351780962518\nLL: -3782.322906364547\nLL: -3782.321183009534\nLL: -3782.3210398963506\nLL: -3782.3210271696703","category":"page"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"warning: Warning\ntree_polish! probably won't find a good tree from a completely start. Different tree search heuristics are required for that.","category":"page"},{"location":"optimization/#Functions","page":"Optimization","title":"Functions","text":"","category":"section"},{"location":"optimization/","page":"Optimization","title":"Optimization","text":"reversibleQ\nunc2probvec\nbranchlength_optim!\nnni_optim!\ntree_polish!","category":"page"},{"location":"optimization/#MolecularEvolution.reversibleQ","page":"Optimization","title":"MolecularEvolution.reversibleQ","text":"reversibleQ(param_vec,eq_freqs)\n\nTakes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.unc2probvec","page":"Optimization","title":"MolecularEvolution.unc2probvec","text":"unc2probvec(v)\n\nTakes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.branchlength_optim!","page":"Optimization","title":"MolecularEvolution.branchlength_optim!","text":"branchlength_optim!(tree::FelNode, models; )\n\nUses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.\n\nKeyword Arguments\n\npartition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models, the default option).\ntol=1e-5: absolute tolerance for the bl_modifier.\nbl_modifier=GoldenSectionOpt(): can either be a optimizer or a sampler (subtype of UnivariateModifier). For optimization, in addition to golden section search, Brent's method can be used by setting bl_modifier=BrentsMethodOpt().\nsort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.\ntraversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.\nshuffle=false: do a randomly shuffled traversal, overrides traversal.\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.nni_optim!","page":"Optimization","title":"MolecularEvolution.nni_optim!","text":"nni_optim!(tree::FelNode, models; )\n\nConsiders local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.\n\nKeyword Arguments\n\npartition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models, the default option).\nselection_rule = x -> argmax(x): a function that takes the current and proposed log likelihoods and selects a nni configuration. Note that the current log likelihood is stored at x[1].\nsort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.\ntraversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.\nshuffle=false: do a randomly shuffled traversal, overrides traversal.\n\n\n\n\n\n","category":"function"},{"location":"optimization/#MolecularEvolution.tree_polish!","page":"Optimization","title":"MolecularEvolution.tree_polish!","text":"tree_polish!(newt, models; tol = 10^-4, verbose = 1, topology = true)\n\nTakes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.\n\n\n\n\n\n","category":"function"},{"location":"simulation/#Simulation","page":"Simulation","title":"Simulation","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"The two key steps in phylogenetic simulation are 1) simulating the phylogeny itself, and 2) simulating data that evolves over the phylogeny.","category":"page"},{"location":"simulation/#Simulating-phylogenies","page":"Simulation","title":"Simulating phylogenies","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"warning: Warning\nWhile our sim_tree function seems to produce trees with the right shape, and is good enough for eg. generating varied tree shapes to evaluate different phylogeny inference schemes under, it is not yet sufficiently checked and tested for use where the details of the coalescent need to be absolutely accurate. It could, for example, be off by a constant factor somewhere. So if you plan on using this in a such a manner for a publication, please check the sim_tree code (and let us know).","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"If you just need a simple tree for testing things, then you can just use:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"tree = sim_tree(n=100)\ntree_draw(tree, draw_labels = false, canvas_height = 5cm)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"This has the characteristic \"coalescent under constant population size\" look.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"However, sim_tree is a bit more powerful than this: it aims to simulate branching under a coalescent process with flexible options for how the effective population size, as well as the sampling rate, might change over time. This is important, because the \"constant population size\" model is quite extreme, and most of the divergence happens in the early internal branches.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"A coalescent process runs backwards in time, starting from the most recent tip, and sampling backwards toward the root, coalescing nodes as it goes, and sometimes adding additional sampled tips. With sim_tree, if nstart = add_limit, then all the tips will be sampled at the same time, and the tree will be ultrametric.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"sim_tree has two arguments driving its flexibility. We'll start with sampling_rate, which controls the rate at which samples are added to the tree. Even under constant effective population size, this can produce interesting behavior.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"for sampling_rate in [5.0, 0.5, 0.05, 0.005]\n tree = sim_tree(100,1000.0,sampling_rate)\n display(tree_draw(tree, draw_labels = false, canvas_height = 5cm))\nend","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Above, this rate was just a fixed constant value, but we can also let this be a function. In this example, we'll plot the tree alongside the sampling rate function, as well as the cumulative number of samples through time.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"s(t) = ifelse(0 sum(x .> sample_times), xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"cumulative samples\", legend = :none))","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Note how the x axis of these plots is flipped, since the leaf furtherest from the root begins at time=0, and the coalescent runs backwards, from tip to root.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We can also vary the effective population size over time, which adds a different dimension of control. Here is an example showing the shape of a tree under exponential growth:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"n(t) = 100000*exp(-t/10)\ntree = sim_tree(100,n,100.0, nstart = 100)\ndisplay(tree_draw(tree, draw_labels = false, canvas_height = 7cm, canvas_width = 14cm))\n\nroot_dists,_ = MolecularEvolution.root2tip_distances(tree)\nplot(0.0:0.1:maximum(root_dists),n, xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"effective population size\", legend = :none)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Logistic growth, with a relatively low sampling rate, provides a reasonable model of an emerging virus that was only sampled later in its growth trajectory, such as HIV.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"n(t) = 10000/(1+exp(t-10))\ntree = sim_tree(100,n,20.0)\ndisplay(tree_draw(tree, draw_labels = false, canvas_height = 7cm, canvas_width = 14cm))\n\nroot_dists,_ = MolecularEvolution.root2tip_distances(tree)\ndisplay(plot(0.0:0.1:maximum(root_dists),n, xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"effective population size\", legend = :none))\n\nmrd = maximum(root_dists)\nsample_times = mrd .- root_dists\nplot(0.0:0.1:mrd,x -> sum(x .> sample_times), xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"cumulative samples\", legend = :none)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"How about a virus with a seasonally varying effective population size, where sampling is proportional to case counts? Between seasons, the effective population size gets so low that the next seasons clade arises from a one or two lineages in the previous season.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"n(t) = exp(sin(t/10) * 2.0 + 4)\ns(t) = n(t)/100\ntree = sim_tree(500,n,s)\ndisplay(tree_draw(tree, draw_labels = false))\n\n\nroot_dists,_ = MolecularEvolution.root2tip_distances(tree)\ndisplay(plot(0.0:0.1:maximum(root_dists),n, xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"effective population size\", legend = :none))\n\nmrd = maximum(root_dists)\nsample_times = mrd .- root_dists\nplot(0.0:0.1:mrd,x -> sum(x .> sample_times), xflip = true, size = (500,250), xlabel = \"time\",ylabel = \"cumulative samples\", legend = :none)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Finally, the mutation_rate argument multiplicatively scales the branch lengths.","category":"page"},{"location":"simulation/#Simulating-evolution-over-phylogenies","page":"Simulation","title":"Simulating evolution over phylogenies","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We'll begin by simulating a tree, like the last example:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"using MolecularEvolution, FASTX, Phylo, Plots, CSV, DataFrames\n\nn(t) = exp(sin(t/10) * 2.0 + 4)\ns(t) = n(t)/100\ntree = sim_tree(500,n,s, mutation_rate = 0.005)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"If we need to open this tree in an external program, we can extract the Newick string representing this tree, and write it to a file:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"newick_string = newick(tree)\nopen(\"flu_sim.tre\",\"w\") do io\n println(io,newick_string)\nend","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Then we can set up a model. In this case, it'll be a combination of a nucleotide model of sequence evolution and Brownian motion over a continuous character.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"nuc_freqs = [0.2,0.3,0.3,0.2]\nnuc_rates = [1.0,2.0,1.0,1.0,1.6,0.5]\nnuc_model = DiagonalizedCTMC(reversibleQ(nuc_rates,nuc_freqs))\nbm_model = BrownianMotion(0.0,1.0)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"As usual, we set up the Partition structure, and load this onto our tree:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"message_template = [NucleotidePartition(nuc_freqs,300),GaussianPartition()]\ninternal_message_init!(tree, message_template)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Then we sample data under our model:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"sample_down!(tree, [nuc_model,bm_model])","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We'll can visualize the Brownian component of the simulation by loading it into the node_dict, and converting to a Phylo.jl tree.","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"for n in getnodelist(tree)\n n.node_data = Dict([\"mu\"=>n.message[2].mean])\nend\nphylo_tree = get_phylo_tree(tree)\nplot(phylo_tree, showtips = false, line_z = \"mu\", colorbar = :none,\n linecolor = :darkrainbow, linewidth = 1.0, size = (600, 600))","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"We can write the simulated data, including sequences and continuous characters, to a CSV:","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"df = DataFrame()\ndf.names = [n.name for n in getleaflist(tree)]\ndf.seqs = [partition2obs(n.message[1]) for n in getleaflist(tree)]\ndf.mu = [partition2obs(n.message[2]) for n in getleaflist(tree)]\nCSV.write(\"flu_sim_seq_and_bm.csv\",df)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Or we could export just the sequences as .fasta","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"write_fasta(\"flu_sim_seq_and_bm.fasta\",df.seqs,seq_names = df.names)","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"Which will look something like this, when opened in AliView","category":"page"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"(Image: )","category":"page"},{"location":"simulation/#Functions","page":"Simulation","title":"Functions","text":"","category":"section"},{"location":"simulation/","page":"Simulation","title":"Simulation","text":"sim_tree\nsample_down!\npartition2obs","category":"page"},{"location":"simulation/#MolecularEvolution.sim_tree","page":"Simulation","title":"MolecularEvolution.sim_tree","text":"sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)\n\nSimulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.\n\nNefunc(t) = (sin(t/10)+1)*100.0 + 10.0 root = simtree(600,Nefunc,1.0) simpletree_draw(ladderize(root))\n\n\n\n\n\nsim_tree(;n = 10)\n\nSimulates tree with constant population size.\n\n\n\n\n\n","category":"function"},{"location":"simulation/#MolecularEvolution.sample_down!","page":"Simulation","title":"MolecularEvolution.sample_down!","text":"sampledown!(root::FelNode,models,partitionlist)\n\nGenerates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"function"},{"location":"simulation/#MolecularEvolution.partition2obs","page":"Simulation","title":"MolecularEvolution.partition2obs","text":"partition2obs(part::Partition)\n\nExtracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.\n\n\n\n\n\n","category":"function"},{"location":"framework/#The-MolecularEvolution.jl-Framework","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"The organizing principle is that the core algorithms, including Felsenstein's algorithm, but also a related family of message passing algorithms and inference machinery, are implemented in a way that does not refer to any specific model or even to any particular data type.","category":"page"},{"location":"framework/#Partitions-and-BranchModels","page":"The MolecularEvolution.jl Framework","title":"Partitions and BranchModels","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"A Partition is a probabilistic representation of some kind of state. Specifically, it needs to be able to represent P(obs|state) and P(obs,state) when considered as functions of state. So it will typically be able to assign a probability to any possible value of state, and is unnormalized - not required to sum or integrate to 1 over all values of state. As an example, for a discrete state with 4 categories, this could just be a vector of 4 numbers.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"For a Partition type to be usable by MolecularEvolution.jl, the combine! function needs to be implemented. If you have P(obsA|state) and P(obsB|state), then combine! calculates P(obsA,obsB|state) under the assumption that obsA and obsB are conditionally independent given state. MolecularEvolution.jl tries to avoid allocating memory, so combine!(dest,src) places in dest the combined Partition in dest. For a discrete state with 4 categories, this is simply element-wise multiplication of two state vectors.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"A BranchModel defines how Partition distributions evolve along branches. Two functions need to be implemented: backward! and forward!. We imagine our trees with the root at the top, and forward! moves from root to tip, and backward! moves from tip to root. backward!(dest::P,src::P,m::BranchModel,n::FelNode) takes a src Partition, representing P(obs-below|state-at-bottom-of-branch), and modifies the dest Partition to be P(obs-below|state-at-top-of-branch), where the branch in question is the branch above the FelNode n. forward! goes in the opposite direction, from P(obs-above,state-at-top-of-branch) to P(obs-above,state-at-bottom-of-branch), with the Partitions now, confusingly, representing joint distributions.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"(Image: )","category":"page"},{"location":"framework/#Messages","page":"The MolecularEvolution.jl Framework","title":"Messages","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"Nodes on our trees work with messages, where a message is a vector of Partition structs. This is in case you wish to model multiple different data types on the same tree. Often, all the messages on the tree will just be arrays containing a single Partition, but if you're accessing them you need to remember that they're in an array!","category":"page"},{"location":"framework/#Trees","page":"The MolecularEvolution.jl Framework","title":"Trees","text":"","category":"section"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"Each node in our tree is a FelNode (\"Fel\" for \"Felsenstein\"). They point to their parent nodes, and an array of their children, and they store their main vector of Partitions, but also cached versions of those from their parents and children, to allow certain message passing schemes. They also have a branchlength field, which tells eg. forward! and backward! how much evolution occurs along the branch above (ie. closer to the root) that node. They also allow for an arbitrary dictionary of node_data, in case a model needs any other branch-specific parameters.","category":"page"},{"location":"framework/","page":"The MolecularEvolution.jl Framework","title":"The MolecularEvolution.jl Framework","text":"The set of algorithms needs to know which model to use for which partition, so the assumption made is that they'll see an array of models whose order will match the partition array. In general, we might want the models to vary from one branch to another, so the central algorithms take a function that associates a FelNode->Vector{: true)\n\nDescription\n\nIndicate that we want to do a downward pass, e.g. sample_down!. The function passed to the constructor takes a node::FelNode as input and returns a Bool that decides if node stores its observations.\n\n\n\n\n\n","category":"type"},{"location":"ancestors/#Ancestral-Reconstruction","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"","category":"section"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"Given a phylogeny, and observations on some set of leaf nodes, \"ancestral reconstruction\" describes a family of approaches for inferring the state of the ancestors, or the distribution over possible states of ancestors.","category":"page"},{"location":"ancestors/#Examples","page":"Ancestral Reconstruction","title":"Examples","text":"","category":"section"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"using MolecularEvolution\n\n#Simulate a small tree, with Brownian motion over it\ntree = sim_tree(n=10)\ninternal_message_init!(tree, GaussianPartition())\nbm_model = BrownianMotion(0.0,0.1)\nsample_down!(tree, bm_model)\n\nr(x) = round(x,sigdigits = 3)\nprintln(\"Leaf values:\")\nfor n in getleaflist(tree)\n println(n.name,\" : \",r(n.message[1].mean))\nend\n\nd = marginal_state_dict(tree,bm_model)\nprintln(\"Inferred internal means (±95% intervals):\")\nfor n in getnonleaflist(tree)\n m,s = d[n][1].mean,sqrt(d[n][1].var)\n println(r(m), \"±\", r(1.96*s), \" - true value: \",r(n.message[1].mean))\nend","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"Leaf values:\ntax8 : -1.03\ntax1 : -1.15\ntax9 : -1.67\ntax10 : -0.112\ntax6 : -0.0183\ntax2 : -0.0574\ntax3 : 0.207\ntax5 : 0.0021\ntax4 : 0.634\ntax7 : 0.544\nInferred internal means (±95% intervals):\n-0.485±0.815 - true value: -0.587\n-1.17±0.556 - true value: -1.37\n-1.1±0.256 - true value: -1.09\n0.116±0.45 - true value: 0.21\n0.0275±0.35 - true value: -0.035\n0.0216±0.283 - true value: 0.0177\n0.0459±0.13 - true value: 0.0485\n0.0532±0.122 - true value: 0.075\n0.571±0.147 - true value: 0.589","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"We can also find the values of the state for each node under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the (maximized) state of the parent node and the observations of all descendents. This ensures that the combination of ancestral states is, jointly, high likelihood. In the case of Brownian motion, these just happen to be the same as the marginal means, but that isn't necessarily the case for other models:","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"d = cascading_max_state_dict(tree,bm_model)\nprintln(\"Inferred internal values:\")\nfor n in getnonleaflist(tree)\n m = d[n][1].mean\n println(r(m), \" - true value: \",r(n.message[1].mean))\nend","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"Inferred most likely (jointly) internal values:\n-0.485 - true value: -0.587\n-1.17 - true value: -1.37\n-1.1 - true value: -1.09\n0.116 - true value: 0.21\n0.0275 - true value: -0.035\n0.0216 - true value: 0.0177\n0.0459 - true value: 0.0485\n0.0532 - true value: 0.075\n0.571 - true value: 0.589","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"And we can sample internal states under our model, but conditioned on the leaf observations:","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"d = endpoint_conditioned_sample_state_dict(tree,bm_model)\nprintln(\"Sampled states, conditioned on observed leaves:\")\nfor n in getnonleaflist(tree)\n m = d[n][1].mean\n println(r(m), \" - true value: \",r(n.message[1].mean))\nend","category":"page"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"Sampled states, conditioned on observed leaves:\n-0.784 - true value: -0.587\n-1.3 - true value: -1.37\n-1.13 - true value: -1.09\n-0.155 - true value: 0.21\n0.0118 - true value: -0.035\n0.0305 - true value: 0.0177\n0.0913 - true value: 0.0485\n0.0542 - true value: 0.075\n0.498 - true value: 0.589","category":"page"},{"location":"ancestors/#Functions","page":"Ancestral Reconstruction","title":"Functions","text":"","category":"section"},{"location":"ancestors/","page":"Ancestral Reconstruction","title":"Ancestral Reconstruction","text":"marginal_state_dict\ncascading_max_state_dict\nendpoint_conditioned_sample_state_dict","category":"page"},{"location":"ancestors/#MolecularEvolution.marginal_state_dict","page":"Ancestral Reconstruction","title":"MolecularEvolution.marginal_state_dict","text":"marginal_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"function"},{"location":"ancestors/#MolecularEvolution.cascading_max_state_dict","page":"Ancestral Reconstruction","title":"MolecularEvolution.cascading_max_state_dict","text":"cascading_max_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"function"},{"location":"ancestors/#MolecularEvolution.endpoint_conditioned_sample_state_dict","page":"Ancestral Reconstruction","title":"MolecularEvolution.endpoint_conditioned_sample_state_dict","text":"endpoint_conditioned_sample_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"function"},{"location":"examples/#Examples","page":"Examples","title":"Examples","text":"","category":"section"},{"location":"examples/#Example-1:-Amino-acid-ancestral-reconstruction-and-visualization","page":"Examples","title":"Example 1: Amino acid ancestral reconstruction and visualization","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"This example reads amino acid sequences from this FASTA file, and a phylogeny from this Newick tree file. A WAG amino acid model, augmented to explicitly model gap (ie. '-') characters, and a global substitution rate is estimated by maximum likelihood. Under this optimized model, the distribution over ancestral amino acids is constructed for each node, and visualized in multiple ways.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using MolecularEvolution, FASTX, Phylo, Plots\n\n#Read in seqs and tree\nseqnames, seqs = read_fasta(\"Data/MusAA_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusAA_IGHV.tre\")\n\n#Compute AA freqs, which become the equilibrium freqs of the model, and the initial root freqs\nAA_freqs = char_proportions(seqs,MolecularEvolution.gappyAAstring)\n#Build the Q matrix\nQ = gappy_Q_from_symmetric_rate_matrix(WAGmatrix,1.0,AA_freqs)\n#Build the model\nm = DiagonalizedCTMC(Q)\n#Set up the memory on the tree\ninitial_partition = GappyAminoAcidPartition(AA_freqs,length(seqs[1]))\npopulate_tree!(tree,initial_partition,seqnames,seqs)\n\n#Set up a likelihood function to find the scaling constant that best fits the branch lengths of the imported tree\n#Note, calling LL will change the rate, so make sure you set it to what you want after this has been called\nll = function(rate; m = m)\n m.r = rate\n return log_likelihood!(tree,m)\nend\nopt_rate = golden_section_maximize(ll, 0.0, 10.0, identity, 1e-11);\nplot(opt_rate*0.87:0.001:opt_rate*1.15,ll,size = (500,250),\n xlabel = \"rate\",ylabel = \"log likelihood\", legend = :none)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Then set the model parameters to the maximum likelihood estimate, and reconstruct the ancestral states.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"m.r = opt_rate\n#Reconstructing the marginal distributions of amino acids at internal nodes\nd = marginal_state_dict(tree,m)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"That's it! Everything else is for visualizing these ancestral states. We'll select a set of amino acid positions to visualize, corresponding to these two (red arrows) alignment columns:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#The alignment indices we want to pay attention to in our reconstructions\nmotif_inds = [52,53]\n\n#We'll compute a confidence score for the inferred marginal state\nconfidence(state,inds) = minimum([maximum(state[:,i]) for i in inds])\n\n#Map motifs to numbers, so we can work with more convenient continuous color scales\nall_motifs = sort(union([partition2obs(d[n][1])[motif_inds] for n in getnodelist(tree)]))\nmotif2num = Dict(zip(all_motifs,1:length(all_motifs)))\n\n#Populating the node_data dictionary to help with plotting\nfor n in getnodelist(tree)\n moti = partition2obs(d[n][1])[motif_inds]\n n.node_data = Dict([\n \"motif\"=>moti,\n \"motif_color\"=>motif2num[moti],\n \"uncertainty\"=>1-confidence(d[n][1].state,motif_inds)\n ])\nend\n\n#Transducing the MolecularEvolution FelNode tree to a Phylo.jl tree, which migrates node_data as well\nphylo_tree = get_phylo_tree(tree)\nnode_unc = values_from_phylo_tree(phylo_tree,\"uncertainty\")\n\nprintln(\"Greatest motif uncertainty: \",maximum([n.node_data[\"uncertainty\"] for n in getnodelist(tree)]))","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Greatest motif uncertainty: 0.6104376723068156","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Plotting, using discrete marker colors\npl = plot(phylo_tree,\n showtips = true, tipfont = 6, marker_group = \"motif\", palette = :seaborn_bright,\n markeralpha = 0.75, markerstrokewidth = 0, margins = 2Plots.cm, legend = :topleft,\n linewidth = 1.5, size = (400, 800))\n\nsavefig_tweakSVG(\"anc_tree_with_legend.svg\", pl)\npl","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Plotting, using discrete marker colors\npl = plot(phylo_tree, treetype = :fan,\n showtips = true, tipfont = 6, marker_group = \"motif\", palette = :seaborn_bright,\n markeralpha = 0.75, markerstrokewidth = 0, margins = 2Plots.cm, legend = :topleft,\n linewidth = 1.5, size = (800, 800))\n\nsavefig_tweakSVG(\"anc_circ_tree_with_legend.svg\", pl)\npl","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Plotting using continuous color scales, and using marker size to show uncertainty in reconstructions\ncolor_scale = :rainbow\npl = plot(phylo_tree, showtips = true, tipfont = 6, marker_z = \"motif_color\", line_z = \"motif_color\",\n markersize = 10 .* sqrt.(node_unc), linecolor = color_scale, markercolor = color_scale, markeralpha = 0.75,\n markerstrokewidth = 0,margins = 2Plots.cm, colorbar = :none, linewidth = 2.5, size = (400, 800))\n\n#Feeble attempt at a manual legend\nmotif_ys = collect(1:length(all_motifs)) .+ (length(seqs) - length(all_motifs))\nscatter!(zeros(length(all_motifs)) , motif_ys , marker = 8, markeralpha = 0.75,\n marker_z = 1:length(all_motifs), markercolor = color_scale, markerstrokewidth = 0.0)\nfor i in 1:length(all_motifs)\n annotate!(0.1, motif_ys[i], all_motifs[i],7)\nend\n\nsavefig_tweakSVG(\"anc_tree_continuous.svg\", pl)\npl","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/#Example-2:-GTRGamma","page":"Examples","title":"Example 2: GTR+Gamma","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"For site-to-site \"random effects\" rate variation, such as under the GTR+Gamma model, we need to use a \"Site-Wise Mixture\" model, or SWMModel with its SWMPartition.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Set up a function that will return a set of rates that will, when equally weighted, VERY coarsely approx a Gamma distribution\nfunction equiprobable_gamma_grids(s,k)\n grids = quantile(Gamma(s,1/s),1/2k:1/k:(1-1/2k))\n grids ./ mean(grids)\nend\n\n#Read in seqs and tree, and populate the three NucleotidePartitions\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\n\n#Set up the Partition that will be replicated in the SWMModel\ninitial_partition = NucleotidePartition(length(seqs[1]))\n\n#To be able to use unconstrained optimization, we use `ParameterHandling.jl`\ninitial_params = (\n rates=positive(ones(6)),\n gam_shape=positive(1.0),\n pi=zeros(3)\n)\nflat_initial_params, unflatten = value_flatten(initial_params)\nnum_params = length(flat_initial_params)\n\n#Setting up the Site-Wise Mixture Partition:\n#Note: this constructor sets the weights of all categories to 1/rate_cats\n#That is fine for our equi-probable category model, but this will need to be different for other models.\nrate_cats = 5\nREL_partition = MolecularEvolution.SWMPartition{NucleotidePartition}(initial_partition,rate_cats)\npopulate_tree!(tree,REL_partition,seqnames,seqs)\n\nfunction build_model_vec(params; cats = rate_cats)\n r_vals = equiprobable_gamma_grids(params.gam_shape,cats)\n pi = unc2probvec(params.pi)\n return MolecularEvolution.SWMModel(DiagonalizedCTMC(reversibleQ(params.rates,pi)),r_vals)\nend\n\nfunction objective(params::NamedTuple; tree = tree)\n v = unc2probvec(params.pi)\n #Root freqs need to be set over all component partitions\n for p in tree.parent_message[1].parts\n p.state .= v\n end\n return -log_likelihood!(tree,build_model_vec(params))\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\n\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\nscore,mini,did_it_work = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\noptimized_model = build_model_vec(final_params)\nLL = log_likelihood!(tree,optimized_model)\nprintln(did_it_work)\nprintln(\"Opt LL:\",LL)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"SUCCESS\nOpt LL:-3728.4761606135307","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Other functions also work with these kinds of random-effects site-wise mixture models:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"tree_polish!(tree,optimized_model)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"LL: -3728.4761606135307\nLL: -3728.1316616075173\nLL: -3728.121005993758\nLL: -3728.1202243978914\nLL: -3728.1201348447107","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Sometimes we might want the rate values for each category to stay fixed, but optimize their weights:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"#Using rate categories with fixed values\nfixed_cats = [0.00001,0.33,1.0,3.0,9.0]\n\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\n\ninitial_partition = NucleotidePartition(length(seqs[1]))\n\ninitial_params = (\n rates=positive(ones(6)),\n cat_weights=zeros(length(fixed_cats)-1), #Category weights\n pi=zeros(3) #Nuc freqs\n)\nflat_initial_params, unflatten = value_flatten(initial_params)\nnum_params = length(flat_initial_params)\n\nREL_partition = MolecularEvolution.SWMPartition{NucleotidePartition}(initial_partition,length(fixed_cats))\npopulate_tree!(tree,REL_partition,seqnames,seqs)\n\nfunction build_model_vec(params; cats = fixed_cats)\n cat_weights = unc2probvec(params.cat_weights)\n pi = unc2probvec(params.pi)\n m = MolecularEvolution.SWMModel(DiagonalizedCTMC(reversibleQ(params.rates,pi)),cats)\n m.weights .= cat_weights\n return m\nend\n\nfunction objective(params::NamedTuple; tree = tree)\n v = unc2probvec(params.pi)\n for p in tree.parent_message[1].parts\n p.state .= v\n end\n return -log_likelihood!(tree,build_model_vec(params))\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\n\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\nscore,mini,did_it_work = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\noptimized_model = build_model_vec(final_params)\nLL = log_likelihood!(tree,optimized_model)\n\nprintln(did_it_work)\nprintln(\"Opt LL:\",LL)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"SUCCESS\nOpt LL:-3719.6290948420706","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"When you have a Site-Wise Mixture (ie. REL) model, the category weights can be handled \"outside\" of the main likelihood calculations. This means that they can be optimized very quickly, within an objective function that is optimizing over the other parameters. The following example uses an EM approach to do this:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using Distributions, FASTX, ParameterHandling, NLopt\n\n#Using rate categories with fixed values\nfixed_cats = [(i/5)^2 for i in 1:12]\n\nseqnames, seqs = read_fasta(\"Data/MusNuc_IGHV.fasta\")\ntree = read_newick_tree(\"Data/MusNuc_IGHV.tre\")\n\ninitial_partition = NucleotidePartition(length(seqs[1]))\n\ninitial_params = (\n rates=positive(ones(6)),\n pi=zeros(3) #Nuc freqs\n)\nflat_initial_params, unflatten = value_flatten(initial_params)\nnum_params = length(flat_initial_params)\n\nREL_partition = MolecularEvolution.SWMPartition{NucleotidePartition}(initial_partition,length(fixed_cats))\npopulate_tree!(tree,REL_partition,seqnames,seqs)\n\nfunction build_model_vec(params; cats = fixed_cats)\n pi = unc2probvec(params.pi)\n m = SWMModel(DiagonalizedCTMC(reversibleQ(params.rates,pi)),cats)\n return m\nend\n\n#LL for a mixture when the grid of probabilities is pre-computed\ngrid_ll(v,g) = sum(log.(sum((v./sum(v)) .* g,dims = 1)))\n\n#Note: we can get away with relatively few EM iterations within the optimization cycle (in this example at least)\nfunction opt_weights_and_LL(temp_part::SWMPartition{PType}; iters = 25) where {PType <: MolecularEvolution.MultiSitePartition} \n g,scals = SWM_prob_grid(temp_part) \n l = size(g)[1]\n #We can optimize the category weights without re-computing felsenstein\n #So it can make sense to do so within the optimization function\n #Which means you don't need to optimize over as many parameters\n θ = weightEM(g,ones(l)./l, iters = iters)\n LL_optimizing_over_weights = grid_ll(θ,g) + sum(scals)\n return θ,LL_optimizing_over_weights\nend\n\nfunction objective(params::NamedTuple; tree = tree)\n v = unc2probvec(params.pi)\n for p in tree.parent_message[1].parts\n p.state .= v\n end\n felsenstein!(tree,build_model_vec(params))\n #Optim inside optim\n #We first need to handle the merge of the parent and root partitions - usually handled for us magically!\n #Be careful: this example is hard-coded for a single partition\n temp_part = copy_partition(tree.parent_message[1])\n combine!(temp_part, tree.message[1])\n θ,LL = opt_weights_and_LL(temp_part)\n return -LL\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\n\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\n@time score,mini,did_it_work = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\noptimized_model = build_model_vec(final_params)\n\nfelsenstein!(tree,optimized_model)\ntemp_part = copy_partition(tree.parent_message[1])\ncombine!(temp_part, tree.message[1])\nθ,_ = opt_weights_and_LL(temp_part, iters = 1000) #polish weights for final pass - quick\noptimized_model.weights .= θ\nLL = log_likelihood!(tree,optimized_model)\n\nprintln(did_it_work, \":\", score)\nprintln(\"Opt LL:\",LL)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"3.932150 seconds (2.38 M allocations: 2.378 GiB, 10.78% gc time, 3.28% compilation time: 7% of which was recompilation)\nSUCCESS:3720.1347720900067\nOpt LL:-3719.4808937732614","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"This can be dramatically faster than trying to directly optimize over category weights when the number of categories grows. The above example took 140s with the direct approach.","category":"page"},{"location":"examples/#Example-3:-FUBAR","page":"Examples","title":"Example 3: FUBAR","text":"","category":"section"},{"location":"examples/","page":"Examples","title":"Examples","text":"This example reads codon sequences from this FASTA file, and a phylogeny from this Newick tree file, and implements FUBAR.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"using MolecularEvolution, FASTX, ParameterHandling, NLopt, Plots\n\n#Read in seqs and tree\nseqnames, seqs = read_fasta(\"Data/Flu.fasta\")\ntree = read_newick_tree(\"Data/Flu.tre\")\n\n#Count F3x4 frequencies from the seqs, and estimate codon freqs from this\nf3x4 = MolecularEvolution.count_F3x4(seqs);\neq_freqs = MolecularEvolution.F3x4_eq_freqs(f3x4);\n\n#Set up a codon partition (will default to Universal genetic code)\ninitial_partition = CodonPartition(Int64(length(seqs[1])/3))\ninitial_partition.state .= eq_freqs\npopulate_tree!(tree,initial_partition,seqnames,seqs)\n\n#We'll use the empirical F3x4 freqs, fixed MG94 alpha=1, and optimize the nuc parameters and MG94 beta\n#Note: the nuc rates are confounded with alpha\ninitial_params = (\n rates=positive(ones(6)), #rates must be non-negative\n beta = positive(1.0)\n)\nflat_initial_params, unflatten = value_flatten(initial_params) #See ParameterHandling.jl docs\nnum_params = length(flat_initial_params)\n\nfunction build_model_vec(p; F3x4 = f3x4, alpha = 1.0)\n #If you run into numerical issues with DiagonalizedCTMC, switch to GeneralCTMC instead\n return DiagonalizedCTMC(MolecularEvolution.MG94_F3x4(alpha, p.beta, reversibleQ(p.rates,ones(4)), F3x4))\nend\n\nfunction objective(params::NamedTuple; tree = tree, eq_freqs = eq_freqs)\n return -log_likelihood!(tree,build_model_vec(params))\nend\n\nopt = Opt(:LN_BOBYQA, num_params)\nmin_objective!(opt, (x,y) -> (objective ∘ unflatten)(x))\nlower_bounds!(opt, [-5.0 for i in 1:num_params])\nupper_bounds!(opt, [5.0 for i in 1:num_params])\nxtol_rel!(opt, 1e-12)\n@time _,mini,_ = NLopt.optimize(opt, flat_initial_params)\n\nfinal_params = unflatten(mini)\nnucmat = reversibleQ(final_params.rates,ones(4))","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":" 10.596546 seconds (840.87 k allocations: 5.221 GiB, 7.45% gc time, 0.35% compilation time: 25% of which was recompilation)\n4×4 Matrix{Float64}:\n -9.41346 1.77048 6.85997 0.783008\n 1.77048 -7.24162 0.280525 5.19061\n 6.85997 0.280525 -8.651 1.5105\n 0.783008 5.19061 1.5105 -7.48412","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"The scaling of that nuc matrix reflects the fact that the we're using a tree that was estimated under a nuc model, but here we're optimizing a codon model. No issue: the nuc rates have absorbed this scaling difference.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Now we set up a 20-by-20 grid, slicing the MG94 α and β parameters at the following values:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"grid_values = 10 .^ (-1.35:0.152:1.6) .- 0.0423174293933042","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"20-element Vector{Float64}:\n 0.0023509298217921012\n 0.021069541732388508\n 0.047632328759699305\n 0.08532645148783018\n 0.13881657986865603\n 0.2147221488835822\n 0.3224365175323036\n 0.4752894025572635\n 0.6921964387638108\n 1.0\n 1.4367909587749033\n 2.05662245423022\n 2.9361990000358853\n 4.184368713262725\n 5.95559333316179\n 8.469062952630463\n 12.0358209216745\n 17.09725564569095\n 24.27972266134484\n 34.47205650419232","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Then we calculate the conditional likelihoods for each site. Note the 20-by-20 grid is stretched out into a length 400 vector to keep things simple. I'm avoiding reshape tricks to keep the grid structure clear.","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"LL_matrix = zeros(length(grid_values)^2,initial_partition.sites);\nalpha_vec = zeros(length(grid_values)^2);\nalpha_ind_vec = zeros(Int64,length(grid_values)^2);\nbeta_vec = zeros(length(grid_values)^2);\nbeta_ind_vec = zeros(Int64,length(grid_values)^2);\n\ni = 1\n@time for (a,alpha) in enumerate(grid_values)\n for (b,beta) in enumerate(grid_values)\n alpha_vec[i],beta_vec[i] = alpha, beta\n alpha_ind_vec[i], beta_ind_vec[i] = a,b\n m = DiagonalizedCTMC(MolecularEvolution.MG94_F3x4(alpha, beta, nucmat, f3x4))\n felsenstein!(tree,m)\n #This is because we need to include the eq freqs in the site LLs:\n combine!(tree.message[1],tree.parent_message[1])\n LL_matrix[i,:] .= MolecularEvolution.site_LLs(tree.message[1])\n i += 1\n end\nend\nprob_matrix = exp.(LL_matrix .- maximum(LL_matrix,dims = 1))\nprob_matrix ./= sum(prob_matrix,dims = 1);","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Then we use an EM-like MAP algorithm to find the posterior grid weights, and visualize this surface:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"LDAθ = weightEM(prob_matrix, ones(length(alpha_vec))./length(alpha_vec), conc = 0.4, iters = 5000);\n\n#A function to viz the grid surface\nfunction gridplot(alpha_ind_vec,beta_ind_vec,grid_values,θ; title = \"\")\n scatter(alpha_ind_vec,beta_ind_vec, zcolor = θ, c = :darktest,\n markersize = sqrt(length(alpha_ind_vec))/2, markershape=:square, markerstrokewidth=0.0, size=(550,500),\n label = :none, xticks = (1:length(grid_values), round.(grid_values,digits = 3)), xrotation = 90,\n yticks = (1:length(grid_values), round.(grid_values,digits = 3)), margin=6Plots.mm,\n xlabel = \"α\", ylabel = \"β\", title = title)\n plot!(1:length(grid_values),1:length(grid_values),color = \"grey\", style = :dash, label = :none)\nend\n\ngridplot(alpha_ind_vec,beta_ind_vec,grid_values,LDAθ)","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"We can see that the posterior distribution over sites is heavily concentrated at β<α. But are there any sites where β>α?","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"weighted_mat = prob_matrix .* LDAθ\nfor site in 1:size(prob_matrix)[2]\n pos = sum(weighted_mat[beta_vec .> alpha_vec,site])/sum(weighted_mat[:,site])\n if pos > 0.9\n println(\"Site $(site): P(β>α)=$(round(pos,digits = 4))\")\n end\nend","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"Site 153: P(β>α)=0.9074\nSite 158: P(β>α)=0.9266\nSite 160: P(β>α)=0.9547","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"And let's visualize one of those sites:","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"gridplot(alpha_ind_vec,beta_ind_vec,grid_values, weighted_mat[:,160]./sum(weighted_mat[:,160]))","category":"page"},{"location":"examples/","page":"Examples","title":"Examples","text":"(Image: )","category":"page"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = MolecularEvolution","category":"page"},{"location":"#MolecularEvolution","page":"Home","title":"MolecularEvolution","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Documentation for MolecularEvolution.","category":"page"},{"location":"#A-Julia-package-for-the-flexible-development-of-phylogenetic-models.","page":"Home","title":"A Julia package for the flexible development of phylogenetic models.","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"MolecularEvolution.jl exploits Julia's multiple dispatch, implementing a fully generic suite of likelihood calculations, branchlength optimization, topology optimization, and ancestral inference. Users can construct trees using already-defined data types and models. But users can define probability distributions over their own data types, and specify the behavior of these under their own model types, and can mix and match different models on the same phylogeny.","category":"page"},{"location":"","page":"Home","title":"Home","text":"If the behavior you need is not already available in MolecularEvolution.jl:","category":"page"},{"location":"","page":"Home","title":"Home","text":"If you have a new data type:\nA Partition type that represents the uncertainty over your state. \ncombine!() that merges evidence from two Partitions.\nIf you have a new model:\nA BranchModel type that stores your model parameters.\nforward!() that evolves state distributions over branches, in the root-to-tip direction.\nbackward!() that reverse-evolves state distributions over branches, in the tip-to-root direction.","category":"page"},{"location":"","page":"Home","title":"Home","text":"And then sampling, likelihood calculations, branch-length optimization, ancestral reconstruction, etc should be available for your new data or model.","category":"page"},{"location":"#Design-principles","page":"Home","title":"Design principles","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"In order of importance, we aim for the following:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Flexibility and generality\nWhere possible, we avoid design decisions that limit the development of new models, or make it harder to develop new models.\nWe do not sacrifice flexibility for performance.\nScalability\nAnalyses implemented using MolecularEvolution.jl should scale to large, real-world datasets.\nPerformance\nWhile the above take precedence over speed, it should be possible to optimize your Partition, combine!(), BranchModel, forward!() and backward!() functions to obtain competative runtimes.","category":"page"},{"location":"#Authors:","page":"Home","title":"Authors:","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Venkatesh Kumar and Ben Murrell, with additional contributions by Sanjay Mohan, Alec Pankow, Hassan Sadiq, and Kenta Sato.","category":"page"},{"location":"#Quick-example:-Likelihood-calculations-under-phylogenetic-Brownian-motion:","page":"Home","title":"Quick example: Likelihood calculations under phylogenetic Brownian motion:","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"using MolecularEvolution, Plots\n\n#First simulate a tree, using a coalescent process\ntree = sim_tree(n=200)\ninternal_message_init!(tree, GaussianPartition())\n#Simulate brownian motion over the tree\nbm_model = BrownianMotion(0.0,1.0)\nsample_down!(tree, bm_model)\n#And plot the log likelihood as a function of the parameter value\nll(x) = log_likelihood!(tree,BrownianMotion(0.0,x))\nplot(0.7:0.001:1.6,ll, xlabel = \"variance per unit time\", ylabel = \"log likelihood\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"(Image: )","category":"page"},{"location":"","page":"Home","title":"Home","text":"","category":"page"},{"location":"","page":"Home","title":"Home","text":"Modules = [MolecularEvolution]","category":"page"},{"location":"#MolecularEvolution.BranchlengthSampler","page":"Home","title":"MolecularEvolution.BranchlengthSampler","text":"BranchlengthSampler\n\nA type that allows you to specify a additive proposal function in the log domain and a prior distrubution over the log of the branchlengths. It also holds the acceptance ratio acc_ratio (acc_ratio[1] stores the number of accepts, and acc_ratio[2] stores the number of rejects).\n\n\n\n\n\n","category":"type"},{"location":"#MolecularEvolution.LazyDown","page":"Home","title":"MolecularEvolution.LazyDown","text":"Constructors\n\nLazyDown(stores_obs)\nLazyDown() = LazyDown(x::FelNode -> true)\n\nDescription\n\nIndicate that we want to do a downward pass, e.g. sample_down!. The function passed to the constructor takes a node::FelNode as input and returns a Bool that decides if node stores its observations.\n\n\n\n\n\n","category":"type"},{"location":"#MolecularEvolution.LazyPartition","page":"Home","title":"MolecularEvolution.LazyPartition","text":"Constructor\n\nLazyPartition{PType}()\n\nInitialize an empty LazyPartition that is meant for wrapping a partition of type PType.\n\nDescription\n\nWith this data structure, you can wrap a partition of choice. The idea is that in some message passing algorithms, there is only a wave of partitions which need to actualize. For instance, a wave following a root-leaf path, or a depth-first traversal. In which case, we can be more economical with our memory consumption. With a worst case memory complexity of O(log(n)), where n is the number of nodes, functionality is provided for:\n\nlog_likelihood!\nfelsenstein!\nsample_down!\n\nnote: Note\nFor successive felsenstein! calls, we need to extract the information at the root somehow after each call. This can be done with e.g. total_LL or site_LLs.\n\nFurther requirements\n\nSuppose you want to wrap a partition of PType with LazyPartition:\n\nIf you're calling log_likelihood! and felsenstein!:\nobs2partition!(partition::PType, obs) that transforms an observation to a partition.\nIf you're calling sample_down!:\npartition2obs(partition::PType) that returns the most likely state from a partition, inverts obs2partition!.\n\n\n\n\n\n","category":"type"},{"location":"#MolecularEvolution.LazyUp","page":"Home","title":"MolecularEvolution.LazyUp","text":"Constructor\n\nLazyUp()\n\nDescription\n\nIndicate that we want to do an upward pass, e.g. felsenstein!.\n\n\n\n\n\n","category":"type"},{"location":"#Base.:==-Union{Tuple{T}, Tuple{T, T}} where T<:AbstractTreeNode","page":"Home","title":"Base.:==","text":"==(t1, t2)\nDefaults to pointer equality\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.SWM_prob_grid-Union{Tuple{SWMPartition{PType}}, Tuple{PType}} where PType<:MultiSitePartition","page":"Home","title":"MolecularEvolution.SWM_prob_grid","text":"SWM_prob_grid(part::SWMPartition{PType}) where {PType <: MultiSitePartition}\n\nReturns a matrix of probabilities for each site, for each model (in the probability domain - not logged!) as well as the log probability offsets\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution._mapreduce-Union{Tuple{T}, Tuple{AbstractTreeNode, T, Any, Any}} where T<:Function","page":"Home","title":"MolecularEvolution._mapreduce","text":"Internal function. Helper for bfsmapreduce and dfsmapreduce\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.backward!-Tuple{DiscretePartition, DiscretePartition, MolecularEvolution.PMatrixModel, FelNode}","page":"Home","title":"MolecularEvolution.backward!","text":"backward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)\n\nPropagate the source partition backwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.bfs_mapreduce-Union{Tuple{T}, Tuple{AbstractTreeNode, T, Any}} where T<:Function","page":"Home","title":"MolecularEvolution.bfs_mapreduce","text":"Performs a BFS map-reduce over the tree, starting at a given node For each node, mapreduce is called as: mapreduce(currnode::FelNode, prevnode::FelNode, aggregator) where prev_node is the previous node visited on the path from the start node to the current node It is expected to update the aggregator, and not return anything.\n\nNot exactly conventional map-reduce, as map-reduce calls may rely on state in the aggregator added by map-reduce calls on other nodes visited earlier.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.branchlength_optim!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.branchlength_optim!","text":"branchlength_optim!(tree::FelNode, models; )\n\nUses golden section search, or optionally Brent's method, to optimize all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.\n\nKeyword Arguments\n\npartition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize branch lengths with all models, the default option).\ntol=1e-5: absolute tolerance for the bl_modifier.\nbl_modifier=GoldenSectionOpt(): can either be a optimizer or a sampler (subtype of UnivariateModifier). For optimization, in addition to golden section search, Brent's method can be used by setting bl_modifier=BrentsMethodOpt().\nsort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.\ntraversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.\nshuffle=false: do a randomly shuffled traversal, overrides traversal.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.brents_method_minimize-Tuple{Any, Real, Real, Any, Real}","page":"Home","title":"MolecularEvolution.brents_method_minimize","text":"brents_method_minimize(f, a::Real, b::Real, transform, t::Real; ε::Real=sqrt(eps()))\n\nBrent's method for minimization.\n\nGiven a function f with a single local minimum in the interval (a,b), Brent's method returns an approximation of the x-value that minimizes f to an accuaracy between 2tol and 3tol, where tol is a combination of a relative and an absolute tolerance, tol := ε|x| + t. ε should be no smaller 2*eps, and preferably not much less than sqrt(eps), which is also the default value. eps is defined here as the machine epsilon in double precision. t should be positive.\n\nThe method combines the stability of a Golden Section Search and the superlinear convergence Successive Parabolic Interpolation has under certain conditions. The method never converges much slower than a Fibonacci search and for a sufficiently well-behaved f, convergence can be exptected to be superlinear, with an order that's usually atleast 1.3247...\n\nExamples\n\njulia> f(x) = exp(-x) - cos(x)\nf (generic function with 1 method)\n\njulia> m = brents_method_minimize(f, -1, 2, identity, 1e-7)\n0.5885327257940255\n\nFrom: Richard P. Brent, \"Algorithms for Minimization without Derivatives\" (1973). Chapter 5.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.cascading_max_state_dict-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.cascading_max_state_dict","text":"cascading_max_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their inferred ancestors under the following scheme: the state that maximizes the marginal likelihood is selected at the root, and then, for each node, the maximum likelihood state is selected conditioned on the maximized state of the parent node and the observations of all descendents. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.char_proportions-Tuple{Any, String}","page":"Home","title":"MolecularEvolution.char_proportions","text":"char_proportions(seqs, alphabet::String)\n\nTakes a vector of sequences and returns a vector of the proportion of each character across all sequences. An example alphabet argument is MolecularEvolution.AAstring.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.collect_leaf_dists-Tuple{Vector{<:AbstractTreeNode}}","page":"Home","title":"MolecularEvolution.collect_leaf_dists","text":"collect_leaf_dists(trees::Vector{<:AbstractTreeNode})\n\nReturns a list of distance matrices containing the distance between the leaf nodes, which can be used to assess mixing.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.colored_seq_draw-Tuple{Any, Any, AbstractString}","page":"Home","title":"MolecularEvolution.colored_seq_draw","text":"colored_seq_draw(x, y, str::AbstractString; color_dict=Dict(), font_size=8pt, posx=hcenter, posy=vcenter)\n\nDraw an arbitrary sequence. color_dict gives a mapping from characters to colors (default black). Default options for nucleotide colorings and amino acid colorings are given in the constants NUC_COLORS and AA_COLORS. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.combine!-Tuple{DiscretePartition, DiscretePartition}","page":"Home","title":"MolecularEvolution.combine!","text":"combine!(dest::P, src::P) where P<:Partition\n\nCombines evidence from two partitions of the same type, storing the result in dest. Note: You should overload this for your own Partititon types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.copy_tree","page":"Home","title":"MolecularEvolution.copy_tree","text":"function copy_tree(root::FelNode, shallow_copy=false)\n\nReturns an untangled copy of the tree. Optionally, the flag `shallow_copy` can be used to obtain a copy of the tree with only the names and branchlengths.\n\n\n\n\n\n","category":"function"},{"location":"#MolecularEvolution.deepequals-Union{Tuple{T}, Tuple{T, T}} where T<:AbstractTreeNode","page":"Home","title":"MolecularEvolution.deepequals","text":"deepequals(t1, t2)\n\nChecks whether two trees are equal by recursively calling this on all fields, except :parent, in order to prevent cycles. In order to ensure that the :parent field is not hiding something different on both trees, ensure that each is consistent first (see: istreeconsistent).\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.dfs_mapreduce-Union{Tuple{T}, Tuple{AbstractTreeNode, T, Any}} where T<:Function","page":"Home","title":"MolecularEvolution.dfs_mapreduce","text":"Performs a DFS map-reduce over the tree, starting at a given node See bfs_mapreduce for more details.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.discrete_name_color_dict-Tuple{AbstractTreeNode, Any}","page":"Home","title":"MolecularEvolution.discrete_name_color_dict","text":"discrete_name_color_dict(newt::AbstractTreeNode,tag_func; rainbow = false, scramble = false, darken = true, col_seed = nothing)\n\nTakes a tree and a tag_func, which converts the leaf label into a category (ie. there should be <20 of these), and returns a color dictionary that can be used to color the leaves or bubbles.\n\nExample tagfunc: function tagfunc(nam::String) return split(nam,\"_\")[1] end\n\nFor prettier colors, but less discrimination: rainbow = true To randomize the rainbow color assignment: scramble = true col_seed is currently set to white, and excluded from the list of colors, to make them more visible.\n\nConsider making your own version of this function to customize colors as you see fit.\n\nExample use: numleaves = 50 Nefunc(t) = 1*(e^-t).+5.0 newt = simtree(numleaves,Nefunc,1.0,nstart = rand(1:numleaves)); newt = ladderize(newt) tagfunc(nam) = mod(sum(Int.(collect(nam))),7) dic = discretenamecolordict(newt,tagfunc,rainbow = true); treedraw(newt,linewidth = 0.5mm,labelcolor_dict = dic)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.draw_example_tree-Tuple{}","page":"Home","title":"MolecularEvolution.draw_example_tree","text":"draw_example_tree(num_leaves = 50)\n\nDraws a tree and shows the code that draws it.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.endpoint_conditioned_sample_state_dict-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.endpoint_conditioned_sample_state_dict","text":"endpoint_conditioned_sample_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and draws samples under the model conditions on the leaf observations. These samples are stored in the nodemessagedict, which is returned. A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.expected_subs_per_site-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.expected_subs_per_site","text":"expected_subs_per_site(Q,mu)\n\nTakes a rate matrix Q and an equilibrium frequency vector, and calculates the expected number of substitutions per site.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.felsenstein!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.felsenstein!","text":"felsenstein!(node::FelNode, models; partition_list = nothing)\n\nShould usually be called on the root of the tree. Propagates Felsenstein pass up from the tips to the root. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.felsenstein_down!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.felsenstein_down!","text":"felsenstein_down!(node::FelNode, models; partition_list = 1:length(tree.message), temp_message = copy_message(tree.message))\n\nShould usually be called on the root of the tree. Propagates Felsenstein pass down from the root to the tips. felsenstein!() should usually be called first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.forward!-Tuple{DiscretePartition, DiscretePartition, MolecularEvolution.PMatrixModel, FelNode}","page":"Home","title":"MolecularEvolution.forward!","text":"forward!(dest::Partition, source::Partition, model::BranchModel, node::FelNode)\n\nPropagate the source partition forwards along the branch to the destination partition, under the model. Note: You should overload this for your own BranchModel types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.gappy_Q_from_symmetric_rate_matrix-Tuple{Any, Any, Any}","page":"Home","title":"MolecularEvolution.gappy_Q_from_symmetric_rate_matrix","text":"gappy_Q_from_symmetric_rate_matrix(sym_mat, gap_rate, eq_freqs)\n\nTakes a symmetric rate matrix and gap rate (governing mutations to and from gaps) and returns a gappy rate matrix. The equilibrium frequencies are multiplied on column-wise.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.get_highlighter_legend-Tuple{Any}","page":"Home","title":"MolecularEvolution.get_highlighter_legend","text":"get_highlighter_legend(legend_colors)\n\nReturns a Compose object given an input dictionary or pairs mapping characters to colors.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.get_max_depth-Tuple{Any, Real}","page":"Home","title":"MolecularEvolution.get_max_depth","text":"get_max_depth(node,depth::Real)\n\nReturn the maximum depth of all children starting from the indicated node.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.get_phylo_tree-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.get_phylo_tree","text":"get_phylo_tree(molev_root::FelNode; data_function = (x -> Tuple{String,Float64}[]))\n\nConverts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.golden_section_maximize-Tuple{Any, Real, Real, Any, Real}","page":"Home","title":"MolecularEvolution.golden_section_maximize","text":"Golden section search.\n\nGiven a function f with a single local minimum in the interval [a,b], gss returns a subset interval [c,d] that contains the minimum with d-c <= tol.\n\nExamples\n\njulia> f(x) = -(x-2)^2\nf (generic function with 1 method)\n\njulia> m = golden_section_maximize(f, 1, 5, identity, 1e-10)\n2.0000000000051843\n\nFrom: https://en.wikipedia.org/wiki/Golden-section_search\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.highlight_seq_draw-Tuple{Any, Any, AbstractString, Any, Any, Any}","page":"Home","title":"MolecularEvolution.highlight_seq_draw","text":"highlight_seq_draw(x, y, str::AbstractString, region, basecolor, hicolor; fontsize=8pt, posx=hcenter, posy=vcenter)\n\nDraw a sequence, highlighting the sites given in region. This can be used along with compose_dict for drawing sequences at nodes in a tree (see tree_draw). Returns a Compose container.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.highlighter_tree_draw-NTuple{4, Any}","page":"Home","title":"MolecularEvolution.highlighter_tree_draw","text":"highlighter_tree_draw(tree, ali_seqs, seqnames, master;\n highlighter_start = 1.1, highlighter_width = 1,\n coord_width = highlighter_start + highlighter_width + 0.1,\n scale_length = nothing, major_breaks = 1000, minor_breaks = 500,\n tree_args = NamedTuple[], legend_padding = 0.5cm, legend_colors = NUC_colors)\n\nDraws a combined tree and highlighter plot. The vector of seqnames must match the node names in tree.\n\nkwargs:\n\ntreeargs: kwargs to pass to `treedraw()`\nlegendcolors: Mapping of characters to highlighter colors (default NTcolors)\nscale_length: Length of the scale bar\nhighlighter_start: Canvas start for the highlighter panel\nhighlighter_width: Canvas width for the highlighter panel\ncoord_width: Total width of the canvas\nmajor_breaks: Numbered breaks for sequence axis\nminor_breaks: Ticks for sequence axis\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.internal_message_init!-Tuple{FelNode, Partition}","page":"Home","title":"MolecularEvolution.internal_message_init!","text":"internal_message_init!(tree::FelNode, partition::Partition)\n\nInitializes the message template for each node in the tree, as an array of the partition.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.internal_message_init!-Tuple{FelNode, Vector{<:Partition}}","page":"Home","title":"MolecularEvolution.internal_message_init!","text":"internal_message_init!(tree::FelNode, empty_message::Vector{<:Partition})\n\nInitializes the message template for each node in the tree, allocating space for each partition.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.istreeconsistent-Tuple{T} where T<:AbstractTreeNode","page":"Home","title":"MolecularEvolution.istreeconsistent","text":"istreeconsistent(root)\n\nChecks whether the :parent field is set to be consistent with the :child field for all nodes in the subtree. \n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.lazyprep!-Tuple{FelNode, Vector{<:Partition}}","page":"Home","title":"MolecularEvolution.lazyprep!","text":"lazyprep!(tree::FelNode, initial_message::Vector{<:Partition}; partition_list = 1:length(tree.message), direction::LazyDirection = LazyUp())\n\nExtra, intermediate step of tree preparations between initializing messages across the tree and calling message passing algorithms with LazyPartition.\n\nPerform a lazysort! on tree to obtain the optimal tree for a lazy felsenstein! prop, or a sample_down!.\nFix tree.parent_message to an initial message.\nPreallocate sufficiently many inner partitions needed for a felsenstein! prop, or a sample_down!.\nSpecialized preparations based on the direction of the operations (forward!, backward!). LazyDown or LazyUp.\n\nSee also LazyDown, LazyUp.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.lazysort!-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.lazysort!","text":"Should be run on a tree containing LazyPartitions before running felsenstein!. Sorts for a minimal count of active partitions during a felsenstein!\nReturns the minimum length of memoryblocks (-1) required for a felsenstein! prop. We need a temporary memoryblock during backward!, hence the '-1'.\n\nnote: Note\nSince felsenstein! uses a stack, we want to avoid having long node.children[1].children[1]... chains\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.leaf_distmat-Tuple{Any}","page":"Home","title":"MolecularEvolution.leaf_distmat","text":"leaf_distmat(tree)\n\nReturns a matrix of the distances between the leaf nodes where the index on the columns and rows are sorted by the leaf names.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.linear_scale-NTuple{5, Any}","page":"Home","title":"MolecularEvolution.linear_scale","text":"linear_scale(val,in_min,in_max,out_min,out_max)\n\nLinearly maps val which lives in [inmin,inmax] to a value in [outmin,outmax]\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.log_likelihood!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.log_likelihood!","text":"log_likelihood!(tree::FelNode, models; partition_list = nothing)\n\nFirst re-computes the upward felsenstein pass, and then computes the log likelihood of this tree. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.log_likelihood-Tuple{FelNode, BranchModel}","page":"Home","title":"MolecularEvolution.log_likelihood","text":"log_likelihood(tree::FelNode, models; partition_list = nothing)\n\nComputed the log likelihood of this tree. Requires felsenstein!() to have been run. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partition_list (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.longest_path-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.longest_path","text":"Returns the longest path in a tree For convenience, this is returned as two lists of form: [leafnode, parentnode, .... root] Where the leaf_node nodes are selected to be the furthest away\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.marginal_state_dict-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.marginal_state_dict","text":"marginal_state_dict(tree::FelNode, model; partition_list = 1:length(tree.message), node_message_dict = Dict{FelNode,Vector{<:Partition}}())\n\nTakes in a tree and a model (which can be a single model, an array of models, or a function that maps FelNode->Array{<:BranchModel}), and returns a dictionary mapping nodes to their marginal reconstructions (ie. P(state|all observations,model)). A subset of partitions can be specified by partition_list, and a dictionary can be passed in to avoid re-allocating memory, in case you're running this over and over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.matrix_for_display-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.matrix_for_display","text":"matrix_for_display(Q,labels)\n\nTakes a numerical matrix and a vector of labels, and returns a typically mixed type matrix with the numerical values and the labels. This is to easily visualize rate matrices in eg. the REPL.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.metropolis_sample-Tuple{FelNode, Vector{<:BranchModel}, Any}","page":"Home","title":"MolecularEvolution.metropolis_sample","text":"function metropolis_sample(\n initial_tree::FelNode,\n models::Vector{<:BranchModel},\n num_of_samples;\n bl_modifier::UnivariateSampler = BranchlengthSampler(Normal(0,2), Normal(-1,1))\n burn_in=1000, \n sample_interval=10,\n collect_LLs = false,\n midpoint_rooting=false,\n)\n\nSamples tree topologies from a posterior distribution. \n\nArguments\n\ninitial_tree: An initial tree topology with the leaves populated with data, for the likelihood calculation.\nmodels: A list of branch models.\nnum_of_samples: The number of tree samples drawn from the posterior.\nbl_sampler: Sampler used to drawn branchlengths from the posterior. \nburn_in: The number of samples discarded at the start of the Markov Chain.\nsample_interval: The distance between samples in the underlying Markov Chain (to reduce sample correlation).\ncollect_LLs: Specifies if the function should return the log-likelihoods of the trees.\nmidpoint_rooting: Specifies whether the drawn samples should be midpoint rerooted (Important! Should only be used for time-reversible branch models starting in equilibrium).\n\nnote: Note\nThe leaves of the initial tree should be populated with data and felsenstein! should be called on the initial tree before calling this function.\n\nReturns\n\nsamples: The trees drawn from the posterior. Returns shallow tree copies, which needs to be repopulated before running felsenstein! etc. \nsample_LLs: The associated log-likelihoods of the tree (optional).\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.midpoint-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.midpoint","text":"Returns a midpoint as a node and a distance above it where the midpoint is\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.mix-Union{Tuple{SWMPartition{PType}}, Tuple{PType}} where PType<:DiscretePartition","page":"Home","title":"MolecularEvolution.mix","text":"mix(swm_part::SWMPartition{PType} ) where {PType <: MultiSitePartition}\n\nmix collapses a Site-Wise Mixture partition to a single component partition, weighted by the site-wise likelihoods for each component, and the init weights. Specifically, it takes a SWMPartition{Ptype} and returns a PType. You'll need to have this implemented for certain helper functionality if you're playing with new kinds of SWMPartitions that aren't mixtures of DiscretePartitions.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.name2node_dict-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.name2node_dict","text":"name2node_dict(root)\n\nReturns a dictionary of leaf nodes, indexed by node.name. Can be used to associate sequences with leaf nodes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.newick-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.newick","text":"newick(root)\n\nReturns a newick string representation of the tree.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.nni_optim!-Tuple{FelNode, Any}","page":"Home","title":"MolecularEvolution.nni_optim!","text":"nni_optim!(tree::FelNode, models; )\n\nConsiders local branch swaps for all branches recursively, maintaining the integrity of the messages. Requires felsenstein!() to have been run first. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another.\n\nKeyword Arguments\n\npartition_list=nothing: (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over (but you probably want to optimize tree topology with all models, the default option).\nselection_rule = x -> argmax(x): a function that takes the current and proposed log likelihoods and selects a nni configuration. Note that the current log likelihood is stored at x[1].\nsort_tree=false: determines if a lazysort! will be performed, which can reduce the amount of temporary messages that has to be initialized.\ntraversal=Iterators.reverse: a function that determines the traversal, permutes an iterable.\nshuffle=false: do a randomly shuffled traversal, overrides traversal.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.node_distances-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.node_distances","text":"Compute the distance to all other nodes from a given node\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.nonreversibleQ-Tuple{Any}","page":"Home","title":"MolecularEvolution.nonreversibleQ","text":"nonreversibleQ(param_vec)\n\nTakes a vector of parameters and returns a nonreversible rate matrix.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.parent_list-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.parent_list","text":"Provides a list of parent nodes nodes from this node up to the root node\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.partition2obs-Tuple{DiscretePartition, String}","page":"Home","title":"MolecularEvolution.partition2obs","text":"partition2obs(part::Partition)\n\nExtracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.plot_multiple_trees-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.plot_multiple_trees","text":"plot_multiple_trees(trees, inf_tree; )\n\nPlots multiple phylogenetic trees against a reference tree, inf_tree. For each tree in trees, a linear Weighted Least Squares (WLS) problem (parameterized by the weight_fn keyword) is solved for the x-positions of the matching nodes between inf_tree and tree.\n\nKeyword Arguments\n\nnode_size=4: the size of the nodes in the plot.\nline_width=0.5: the width of the branches from trees.\nfont_size=10: the font size for the leaf labels.\nmargin=1.5: the margin between a leaf node and its label.\nline_alpha=0.05: the transparency level of the branches from trees.\ny_jitter=0.0: the standard deviation of the noise in the y-coordinate.\nweight_fn=n::FelNode -> ifelse(isroot(n), 1.0, 0.0)): a function that assigns a weight to a node for the WLS problem.\nopt_scale=true: whether to include a scaling parameter for the WLS problem.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.populate_tree!-Tuple{FelNode, Partition, Any, Any}","page":"Home","title":"MolecularEvolution.populate_tree!","text":"populate_tree!(tree::FelNode, starting_message, names, data; init_all_messages = true, tolerate_missing = 1, leaf_name_transform = x -> x)\n\nTakes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if\n\ntolerate_missing = 0, an error will be thrown\ntolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)\ntolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)\n\nA renaming function that can eg. strip tags from the tree when matching leaf names with names can be passed to leaf_name_transform\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.promote_internal-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.promote_internal","text":"promote_internal(tree::FelNode)\n\nCreates a new tree similar to the given tree, but with 'dummy' leaf nodes (w/ zero branchlength) representing each internal node (for drawing / evenly spacing labels internal nodes).\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.quadratic_CI-Tuple{Function, Vector, Int64}","page":"Home","title":"MolecularEvolution.quadratic_CI","text":"quadratic_CI(f::Function,opt_params::Vector, param_ind::Int; rate_conf_level = 0.99, nudge_amount = 0.01)\n\nTakes a NEGATIVE log likelihood function (compatible with Optim.jl), a vector of maximizing parameters, an a parameter index. Returns the quadratic confidence interval.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.quadratic_CI-Tuple{Vector, Vector}","page":"Home","title":"MolecularEvolution.quadratic_CI","text":"quadratic_CI(xvec,yvec; rate_conf_level = 0.99)\n\nTakes xvec, a vector of parameter values, and yvec, a vector of log likelihood evaluations (note: NOT the negative LLs you) might use with Optim.jl. Returns the confidence intervals computed by a quadratic approximation to the LL.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.read_fasta-Tuple{String}","page":"Home","title":"MolecularEvolution.read_fasta","text":"read_fasta(filepath::String)\n\nReads in a fasta file and returns a tuple of (seqnames, seqs).\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.read_newick_tree-Tuple{String}","page":"Home","title":"MolecularEvolution.read_newick_tree","text":"readnewicktree(treefile)\n\nReads in a tree from a file, of type FelNode\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.reversibleQ-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.reversibleQ","text":"reversibleQ(param_vec,eq_freqs)\n\nTakes a vector of parameters and equilibrium frequencies and returns a reversible rate matrix. The parameters are the upper triangle of the rate matrix, with the diagonal elements omitted, and the equilibrium frequencies are multiplied column-wise.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.root2tip_distances-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.root2tip_distances","text":"root2tips(root::AbstractTreeNode)\n\nReturns a vector of root-to-tip distances, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sample_down!-Tuple{FelNode, Any, Any}","page":"Home","title":"MolecularEvolution.sample_down!","text":"sampledown!(root::FelNode,models,partitionlist)\n\nGenerates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sample_from_message!-Tuple{Vector{<:Partition}}","page":"Home","title":"MolecularEvolution.sample_from_message!","text":"sample_from_message!(message::Vector{<:Partition})\n\n#Replaces an uncertain message with a sample from the distribution represented by each partition.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.savefig_tweakSVG-Tuple{Any, Context}","page":"Home","title":"MolecularEvolution.savefig_tweakSVG","text":"savefig_tweakSVG(fname, plot::Context; width = 10cm, height = 10cm, linecap_round = true, white_background = true)\n\nSaves a figure created using the Compose approach, but tweaks the SVG after export.\n\neg. savefig_tweakSVG(\"export.svg\",pl)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.savefig_tweakSVG-Tuple{Any, Plots.Plot}","page":"Home","title":"MolecularEvolution.savefig_tweakSVG","text":"savefig_tweakSVG(fname, plot::Plots.Plot; hack_bounding_box = true, new_viewbox = nothing, linecap_round = true)\n\nNote: Might only work if you're using the GR backend!! Saves a figure created using the Phylo Plots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.\n\neg. savefig_tweakSVG(\"export.svg\",pl, new_viewbox = [-100, -100, 3000, 4500])\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.shortest_path_between_nodes-Tuple{FelNode, FelNode}","page":"Home","title":"MolecularEvolution.shortest_path_between_nodes","text":"Shortest path between nodes, returned as two lists, each starting with one of the two nodes, and ending with the common ancestor\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sibling_inds-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.sibling_inds","text":"sibling_inds(node)\n\nReturns logical indices of the siblings in the parent's child's vector.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.siblings-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.siblings","text":"siblings(node)\n\nReturns a vector of siblings of node.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sim_tree-Tuple{Int64, Any, Any}","page":"Home","title":"MolecularEvolution.sim_tree","text":"sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)\n\nSimulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.\n\nNefunc(t) = (sin(t/10)+1)*100.0 + 10.0 root = simtree(600,Nefunc,1.0) simpletree_draw(ladderize(root))\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.sim_tree-Tuple{}","page":"Home","title":"MolecularEvolution.sim_tree","text":"sim_tree(;n = 10)\n\nSimulates tree with constant population size.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.simple_radial_tree_plot-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.simple_radial_tree_plot","text":"simple_radial_tree_plot(root::FelNode; canvas_width = 10cm, line_color = \"black\", line_width = 0.1mm)\n\nDraws a radial tree. No frills. No labels. Canvas height is automatically determined to avoid distorting the tree.\n\nnewt = betternewickimport(\"((A:1,B:1,C:1,D:1,E:1,F:1,G:1):1,(H:1,I:1):1);\", FelNode{Float64}); simpleradialtreeplot(newt,linewidth = 0.5mm,root_angle = 7/10)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.simple_tree_draw-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.simple_tree_draw","text":"img = simpletreedraw(tree::FelNode; canvaswidth = 15cm, canvasheight = 15cm, linecolor = \"black\", linewidth = 0.1mm)\n\nA line drawing of a tree with very few options.\n\nimg = simple_tree_draw(tree)\nimg |> SVG(\"imgout.svg\",10cm, 10cm)\nOR\nusing Cairo\nimg |> PDF(\"imgout.pdf\",10cm, 10cm)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.total_LL-Tuple{Partition}","page":"Home","title":"MolecularEvolution.total_LL","text":"total_LL(p::Partition)\n\nIf called on the root, it returns the log likelihood associated with that partition. Can be overloaded for complex partitions without straightforward site log likelihoods.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree2distances-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.tree2distances","text":"tree2distances(root::AbstractTreeNode)\n\nReturns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree2shared_branch_lengths-Tuple{AbstractTreeNode}","page":"Home","title":"MolecularEvolution.tree2shared_branch_lengths","text":"tree2distances(root::AbstractTreeNode)\n\nReturns a distance matrix for all pairs of leaf nodes, and a node-to-index dictionary. Be aware that this dictionary will break when any of the node content (ie. anything on the tree) changes.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree_draw-Tuple{FelNode}","page":"Home","title":"MolecularEvolution.tree_draw","text":"tree_draw(tree::FelNode;\n canvas_width = 15cm, canvas_height = 15cm,\n stretch_for_labels = 2.0, draw_labels = true,\n line_width = 0.1mm, font_size = 4pt,\n min_dot_size = 0.00, max_dot_size = 0.01,\n line_opacity = 1.0,\n dot_opacity = 1.0,\n name_opacity = 1.0,\n horizontal = true,\n dot_size_dict = Dict(), dot_size_default = 0.0,\n dot_color_dict = Dict(), dot_color_default = \"black\",\n line_color_dict = Dict(), line_color_default = \"black\",\n label_color_dict = Dict(), label_color_default = \"black\",\n nodelabel_dict = Dict(),compose_dict = Dict()\n )\n\nDraws a tree with a number of self-explanatory options. Dictionaries that map a node to a color/size are used to control per-node plotting options. compose_dict must be a FelNode->function(x,y) dictionary that returns a compose() struct.\n\nExample using compose_dict\n\nstr_tree = \"(((((tax24:0.09731668728575642,(tax22:0.08792233964843627,tax18:0.9210388482867483):0.3200367900275155):0.6948314526087965,(tax13:1.9977212308725611,(tax15:0.4290074347886068,(tax17:0.32928401808187824,(tax12:0.3860215462534818,tax16:0.2197134841232339):0.1399122681886174):0.05744611946245004):1.4686085778061146):0.20724159879522402):0.4539334554156126,tax28:0.4885576926440158):0.002162260013924424,tax26:0.9451873777301325):3.8695419798779387,((tax29:0.10062813251515536,tax27:0.27653633028085006):0.04262434258357507,(tax25:0.009345653929737636,((tax23:0.015832941547076644,(tax20:0.5550597590956172,((tax8:0.6649025646927402,tax9:0.358506423199849):0.1439516404012261,tax11:0.01995439013213013):1.155181296134081):0.17930021667907567):0.10906638146207207,((((((tax6:0.013708993438720255,tax5:0.061144001556547097):0.1395453591567641,tax3:0.4713722705245479):0.07432598428904214,tax1:0.5993347898257291):1.0588025698844894,(tax10:0.13109032492533992,(tax4:0.8517302241963356,(tax2:0.8481963081549965,tax7:0.23754095940676642):0.2394313086297733):0.43596704123297675):0.08774657269409454):0.9345533723114966,(tax14:0.7089558245245173,tax19:0.444897137240675):0.08657675809803095):0.01632062723968511,tax21:0.029535281963725537):0.49502691718938285):0.25829576024240986):0.7339777396780424):4.148878039524972):0.0\"\nnewt = gettreefromnewick(str_tree, FelNode)\nladderize!(newt)\ncompose_dict = Dict()\nfor n in getleaflist(newt)\n #Replace the rand(4) with the frequencies you actually want.\n compose_dict[n] = (x,y)->pie_chart(x,y,MolecularEvolution.sum2one(rand(4)),size = 0.03)\nend\ntree_draw(newt,draw_labels = false,line_width = 0.5mm, compose_dict = compose_dict)\n\n\nimg = tree_draw(tree)\nimg |> SVG(\"imgout.svg\",10cm, 10cm)\nOR\nusing Cairo\nimg |> PDF(\"imgout.pdf\",10cm, 10cm)\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.tree_polish!-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.tree_polish!","text":"tree_polish!(newt, models; tol = 10^-4, verbose = 1, topology = true)\n\nTakes a tree and a model function, and optimizes branch lengths and, optionally, topology. Returns final LL. Set verbose=0 to suppress output. Note: This is not intended for an exhaustive tree search (which requires different heuristics), but rather to polish a tree that is already relatively close to the optimum.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.unc2probvec-Tuple{Any}","page":"Home","title":"MolecularEvolution.unc2probvec","text":"unc2probvec(v)\n\nTakes an array of N-1 unbounded values and returns an array of N values that sums to 1. Typically useful for optimizing over categorical probability distributions.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.univariate_maximize-Tuple{Any, Real, Real, Any, BrentsMethodOpt, Real}","page":"Home","title":"MolecularEvolution.univariate_maximize","text":"univariate_maximize(f, a::Real, b::Real, transform, optimizer::BrentsMethodOpt, t::Real; ε::Real=sqrt(eps))\n\nMaximizes f(x) using Brent's method. See ?brents_method_minimize.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.univariate_maximize-Tuple{Any, Real, Real, Any, GoldenSectionOpt, Real}","page":"Home","title":"MolecularEvolution.univariate_maximize","text":"univariate_maximize(f, a::Real, b::Real, transform, optimizer::GoldenSectionOpt, tol::Real)\n\nMaximizes f(x) using a Golden Section Search. See ?golden_section_maximize.\n\nExamples\n\njulia> f(x) = -(x-2)^2\nf (generic function with 1 method)\n\njulia> m = univariate_maximize(f, 1, 5, identity, GoldenSectionOpt(), 1e-10)\n2.0000000000051843\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.univariate_sampler-Tuple{Any, BranchlengthSampler, Any}","page":"Home","title":"MolecularEvolution.univariate_sampler","text":"univariate_sampler(LL, modifier::BranchlengthPeturbation, curr_branchlength)\n\nA MCMC algorithm that draws the next sample of a Markov Chain that approximates the Posterior distrubution over the branchlengths.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.values_from_phylo_tree-Tuple{Any, Any}","page":"Home","title":"MolecularEvolution.values_from_phylo_tree","text":"values_from_phylo_tree(phylo_tree, key)\n\nReturns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.weightEM-Tuple{Matrix{Float64}, Any}","page":"Home","title":"MolecularEvolution.weightEM","text":"weightEM(con_lik_matrix::Array{Float64,2}, θ; conc = 0.0, iters = 500)\n\nTakes a conditional likelihood matrix (#categories-by-sites) and a starting frequency vector θ (length(θ) = #categories) and optimizes θ (using Expectation Maximization. Maybe.). If conc > 0 then this gives something like variational bayes behavior for LDA. Maybe.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.write_fasta-Tuple{String, Vector{String}}","page":"Home","title":"MolecularEvolution.write_fasta","text":"write_fasta(filepath::String, sequences::Vector{String}; seq_names = nothing)\n\nWrites a fasta file from a vector of sequences, with optional seq_names.\n\n\n\n\n\n","category":"method"},{"location":"#MolecularEvolution.write_nexus-Tuple{String, FelNode}","page":"Home","title":"MolecularEvolution.write_nexus","text":"write_nexus(fname::String,tree::FelNode)\n\nWrites the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.\n\n\n\n\n\n","category":"method"},{"location":"IO/#Input/Output","page":"Input/Output","title":"Input/Output","text":"","category":"section"},{"location":"IO/","page":"Input/Output","title":"Input/Output","text":"write_nexus\nnewick\nread_newick_tree\npopulate_tree!\nread_fasta\nwrite_fasta","category":"page"},{"location":"IO/#MolecularEvolution.write_nexus","page":"Input/Output","title":"MolecularEvolution.write_nexus","text":"write_nexus(fname::String,tree::FelNode)\n\nWrites the tree as a nexus file, suitable for opening in eg. FigTree. Data in the node_data dictionary will be converted into annotations. Only tested for simple node_data formats and types.\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.newick","page":"Input/Output","title":"MolecularEvolution.newick","text":"newick(root)\n\nReturns a newick string representation of the tree.\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.read_newick_tree","page":"Input/Output","title":"MolecularEvolution.read_newick_tree","text":"readnewicktree(treefile)\n\nReads in a tree from a file, of type FelNode\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.populate_tree!","page":"Input/Output","title":"MolecularEvolution.populate_tree!","text":"populate_tree!(tree::FelNode, starting_message, names, data; init_all_messages = true, tolerate_missing = 1, leaf_name_transform = x -> x)\n\nTakes a tree, and a starting_message (which will serve as the memory template for populating messages all over the tree). starting_message can be a message (ie. a vector of Partitions), but will also work with a single Partition (although the tree) will still be populated with a length-1 vector of Partitions. Further, as long as obs2partition is implemented for your Partition type, the leaf nodes will be populated with the data from data, matching the names on each leaf. When a leaf on the tree has a name that doesn't match anything in names, then if\n\ntolerate_missing = 0, an error will be thrown\ntolerate_missing = 1, a warning will be thrown, and the message will be set to the uninformative message (requires identity!(::Partition) to be defined)\ntolerate_missing = 2, the message will be set to the uninformative message, without warnings (requires identity!(::Partition) to be defined)\n\nA renaming function that can eg. strip tags from the tree when matching leaf names with names can be passed to leaf_name_transform\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.read_fasta","page":"Input/Output","title":"MolecularEvolution.read_fasta","text":"read_fasta(filepath::String)\n\nReads in a fasta file and returns a tuple of (seqnames, seqs).\n\n\n\n\n\n","category":"function"},{"location":"IO/#MolecularEvolution.write_fasta","page":"Input/Output","title":"MolecularEvolution.write_fasta","text":"write_fasta(filepath::String, sequences::Vector{String}; seq_names = nothing)\n\nWrites a fasta file from a vector of sequences, with optional seq_names.\n\n\n\n\n\n","category":"function"}]
}
diff --git a/dev/simulation/index.html b/dev/simulation/index.html
index c831f70..50d0fba 100644
--- a/dev/simulation/index.html
+++ b/dev/simulation/index.html
@@ -1,5 +1,5 @@
-Simulation · MolecularEvolution.jl
While our sim_tree function seems to produce trees with the right shape, and is good enough for eg. generating varied tree shapes to evaluate different phylogeny inference schemes under, it is not yet sufficiently checked and tested for use where the details of the coalescent need to be absolutely accurate. It could, for example, be off by a constant factor somewhere. So if you plan on using this in a such a manner for a publication, please check the sim_tree code (and let us know).
If you just need a simple tree for testing things, then you can just use:
tree = sim_tree(n=100)
+Simulation · MolecularEvolution.jl
While our sim_tree function seems to produce trees with the right shape, and is good enough for eg. generating varied tree shapes to evaluate different phylogeny inference schemes under, it is not yet sufficiently checked and tested for use where the details of the coalescent need to be absolutely accurate. It could, for example, be off by a constant factor somewhere. So if you plan on using this in a such a manner for a publication, please check the sim_tree code (and let us know).
If you just need a simple tree for testing things, then you can just use:
tree = sim_tree(n=100)
tree_draw(tree, draw_labels = false, canvas_height = 5cm)
This has the characteristic "coalescent under constant population size" look.
However, sim_tree is a bit more powerful than this: it aims to simulate branching under a coalescent process with flexible options for how the effective population size, as well as the sampling rate, might change over time. This is important, because the "constant population size" model is quite extreme, and most of the divergence happens in the early internal branches.
A coalescent process runs backwards in time, starting from the most recent tip, and sampling backwards toward the root, coalescing nodes as it goes, and sometimes adding additional sampled tips. With sim_tree, if nstart = add_limit, then all the tips will be sampled at the same time, and the tree will be ultrametric.
sim_tree has two arguments driving its flexibility. We'll start with sampling_rate, which controls the rate at which samples are added to the tree. Even under constant effective population size, this can produce interesting behavior.
for sampling_rate in [5.0, 0.5, 0.05, 0.005]
tree = sim_tree(100,1000.0,sampling_rate)
display(tree_draw(tree, draw_labels = false, canvas_height = 5cm))
@@ -58,4 +58,4 @@
df.names = [n.name for n in getleaflist(tree)]
df.seqs = [partition2obs(n.message[1]) for n in getleaflist(tree)]
df.mu = [partition2obs(n.message[2]) for n in getleaflist(tree)]
-CSV.write("flu_sim_seq_and_bm.csv",df)
sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)
Simulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.
Generates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
Extracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.
sim_tree(add_limit::Int,Ne_func,sample_rate_func; nstart = 1, time = 0.0, mutation_rate = 1.0, T = Float64)
Simulates a tree of type FelNode{T}. Allows an effective population size function (Nefunc), as well as a sample rate function (samplerate_func), which can also just be constants.
Generates samples under the model. The root.parentmessage is taken as the starting distribution, and node.message contains the sampled messages. models can either be a single model (if the messages on the tree contain just one Partition) or an array of models, if the messages have >1 Partition, or a function that takes a node, and returns a Vector{<:BranchModel} if you need the models to vary from one branch to another. partitionlist (eg. 1:3 or [1,3,5]) lets you choose which partitions to run over.
Extracts the most likely state from a Partition, transforming it into a convenient type. For example, a NucleotidePartition will be transformed into a nucleotide sequence of type String. Note: You should overload this for your own Partititon types.
We offer two routes to visualization. The first is using our own plotting routines, built atop Compose.jl. The second converts our trees to Phylo.jl trees, and plots with their Plots.jl recipes. The Compose, Plots, and Phylo dependencies are optional.
using MolecularEvolution, Plots, Phylo
-
-#First simulate a tree, and then Brownian motion:
-tree = sim_tree(n=20)
-internal_message_init!(tree, GaussianPartition())
-bm_model = BrownianMotion(0.0,0.1)
-sample_down!(tree, bm_model)
-
-#We'll add the Gaussian means to the node_data dictionaries
-for n in getnodelist(tree)
- n.node_data = Dict(["mu"=>n.message[1].mean])
-end
-
-#Transducing the mol ev tree to a Phylo.jl tree
-phylo_tree = get_phylo_tree(tree)
-
-pl = plot(phylo_tree,
- showtips = true, tipfont = 6, marker_z = "mu", markeralpha = 0.5, line_z = "mu", linecolor = :darkrainbow,
- markersize = 4.0, markerstrokewidth = 0,margins = 1Plots.cm,
- linewidth = 1.5, markercolor = :darkrainbow, size = (500, 500))
We also offer savefig_tweakSVG("simple_plot_example.svg", pl) for some post-processing tricks that improve the exported trees, like rounding line caps, and values_from_phylo_tree(phylo_tree,"mu") which can extract stored quantities in the right order for passing into eg. markersize options when plotting.
For a more comprehensive list of things you can do with Phylo.jl plots, please see their documentation.
The Compose.jl in-house tree drawing offers extensive flexibility. Here is an example that plots a pie chart representing the marginal probability of each of the 4 possible nucleotides on all nodes on the tree:
Converts a FelNode tree to a Phylo tree. The data_function should return a list of tuples of the form (key, value) to be added to the Phylo tree data Dictionary. Any key/value pairs on the FelNode node_data Dict will also be added to the Phylo tree.
values_from_phylo_tree(phylo_tree, key)
-
-Returns a list of values from the given key in the nodes of the phylo_tree, in an order that is somehow compatible with the order the nodes get plotted in.
Note: Might only work if you're using the GR backend!! Saves a figure created using the PhyloPlots recipe, but tweaks the SVG after export. new_viewbox needs to be an array of 4 numbers, typically starting at [0 0 plot_width*4 plot_height*4] but this lets you add shifts, in case the plot is getting cut off.
Draws a tree with a number of self-explanatory options. Dictionaries that map a node to a color/size are used to control per-node plotting options. compose_dict must be a FelNode->function(x,y) dictionary that returns a compose() struct.
Example using compose_dict
str_tree = "(((((tax24:0.09731668728575642,(tax22:0.08792233964843627,tax18:0.9210388482867483):0.3200367900275155):0.6948314526087965,(tax13:1.9977212308725611,(tax15:0.4290074347886068,(tax17:0.32928401808187824,(tax12:0.3860215462534818,tax16:0.2197134841232339):0.1399122681886174):0.05744611946245004):1.4686085778061146):0.20724159879522402):0.4539334554156126,tax28:0.4885576926440158):0.002162260013924424,tax26:0.9451873777301325):3.8695419798779387,((tax29:0.10062813251515536,tax27:0.27653633028085006):0.04262434258357507,(tax25:0.009345653929737636,((tax23:0.015832941547076644,(tax20:0.5550597590956172,((tax8:0.6649025646927402,tax9:0.358506423199849):0.1439516404012261,tax11:0.01995439013213013):1.155181296134081):0.17930021667907567):0.10906638146207207,((((((tax6:0.013708993438720255,tax5:0.061144001556547097):0.1395453591567641,tax3:0.4713722705245479):0.07432598428904214,tax1:0.5993347898257291):1.0588025698844894,(tax10:0.13109032492533992,(tax4:0.8517302241963356,(tax2:0.8481963081549965,tax7:0.23754095940676642):0.2394313086297733):0.43596704123297675):0.08774657269409454):0.9345533723114966,(tax14:0.7089558245245173,tax19:0.444897137240675):0.08657675809803095):0.01632062723968511,tax21:0.029535281963725537):0.49502691718938285):0.25829576024240986):0.7339777396780424):4.148878039524972):0.0"
-newt = gettreefromnewick(str_tree, FelNode)
-ladderize!(newt)
-compose_dict = Dict()
-for n in getleaflist(newt)
- #Replace the rand(4) with the frequencies you actually want.
- compose_dict[n] = (x,y)->pie_chart(x,y,MolecularEvolution.sum2one(rand(4)),size = 0.03)
-end
-tree_draw(newt,draw_labels = false,line_width = 0.5mm, compose_dict = compose_dict)
-
-
-img = tree_draw(tree)
-img |> SVG("imgout.svg",10cm, 10cm)
-OR
-using Cairo
-img |> PDF("imgout.pdf",10cm, 10cm)