Here we introduce a discrete time model to account for the synchronous process of clonal expansion and barcode evolution with a Maximum Likelihood framework. Hence, we model over the cellular geneaology as discrete branching process. For proliferation-active cells like zygote or stem cells, We uses the Galton-Watson process to model over the process of clonal expansion starting from a single progenitor cell after fixed generations,which incorporate the fixed duration of the tracing experiment and provdies intuitive understanding of the edge length as generation time in the reconstructed tree.
To reproduce the DeLTree's reconstruction for Subchallenge 1 in DREAM CHALLENGE, we provides the code within the RMarkdown file in DeLTree_scripts.R file, and the dataset on dream_challenge_sub1 folder with easy installation of DeLTree package
library(devtools)
install_github("ask-more-questions/DeLTree")
library(DeLTree)
library(caret)
We provide two function to reconstruct a starting tree from the barcode infomation and a NNI Search function as main functionns for lineage reconstruction. Here we will present the function alongside along with the choice of parameters.The output of tree is versitile for newick and nexus and other acceptable format accepted in write.tree
in ape
.
nj_tree(character_info = character_info,site_num = 10,original_state = "1")
The character_info is a data frame comprising of two variables named cell and barcode.This data frame can be read from format like txt and read.table function inutils
.
The parametersite_number
is the number of sites within one barcode. The paramter "original_state" is default set as "1", which implies the unmutated state on barcode. Here, we wanted to stress that, the temporary function only takes 9 differnt mutant denoted as "0","2",..."9" in one site for the limit of one-hot encoding step we use incaret
package.
cherry_based_recon(character_info,mu,alpha,non_bifur_pro,alternative_threshold,nGen,site_num,state_num,precluster_replicate)
This above function describes a reconstruction method with Cherry as basic unit.
Here,mu
andalpha
are the site-specific mutation rate and editing priors on one generation time in barcode respectively.non_bifur_pro
is the proportion of cell not bifurcated after one generation time, reepresenting growth characteristic of lineages.Alternative_threshold
is the set default as zero and if 1,two sequences are 10 times more likely to have a bifurcation event in 2 generation.
nni_deltree(current_tree,mu,alpha,nGen,non_bifur_pro,state_num,edgelength_assignment)
In this function, we take a tree function of phylo structure with its tip label formatted as 'cell_barcode' as input for current tree. and perform one nni move and output the best tree as a phylo structure and the corresponding likelihood.
Here thenGen
implies a prefixed tree height based on experimental duration scaled on cell divisions. Paramteredgelength_assignment
provides two options as "bottom up iteration" or "direct assignment", where the former applies the longest pending edge length as initiation and performs bottom-up approach to local optimization, while the latter utilizes the depth of node and discrete edge length rule as constraints for edge length of cherry structure, which is used as default.
Here we present a simulation example with fixed time duration and bifurcation probability per generation time.
Supposing a no death scheme for
sim_tree = topology_simulation(bifur_pro = 0.76 ,nGen = 6)
mu = rep(0.15,10)
alpha = rep(list(0.5,0.5),10)
lineage = lineage_sim(tree = sim_tree,state_num = 3, site_num = 10,mu,alpha)
sim_tree$tip.label <- sapply(sim_tree$tip.label,function(x){
state_index <-lineage$cell %in% x
cell_state <- paste(x,lineage$state[state_index],sep = "_")
return(cell_state)})
The below Figure shows an example of a simulated tree with the setting above.
For barcode information saved in sub_test_1.txt in the dream_challenge_sub1 folder.
lineage_file = "dream_challenge_sub1/sub1_test_1.txt"
character_info = read.table(file = lineage_file,header= TRUE,colClass = "character")
NJ_tree = nj_tree(character_info = character_info,site_num = 10,original_state = "1")
max_iter = 10
mu = rep(0.15,10)
alpha = rep(list(0.5,0.5),10)
non_bifur_pro = 0.2
current_tree_del <- direct_assignment(phylo = NJ_tree,nGen = max(node.depth(NJ_tree,method = 2)),state_num = 3,mu = mu,alpha = alpha,non_bifur_pro = non_bifur_pro)
current_likelihood <- deltree_likelihood(discrete_EdgeLength_tree = current_tree_del,state_num = 3, mu = mu,alpha = alpha,non_bifur_pro = non_bifur_pro)[3]
for (j in (1:max_iter)){
nni_info <- nni_deltree(current_tree = current_tree_del,mu,alpha,nGen=max(node.depth(current_tree_del,method = 2)),non_bifur_pro,state_num =3,edgelength_assignment = "direct assignment")
if (max(nni_info$likelihood)>current_likelihood){
current_tree_del <- nni_info$best_tree
current_likelihood <- max(nni_info$likelihood)
} else break
}
NJ_DelTree = current_tree_del
Here parameter nGen
is set as the topological height of the current tree.