Skip to content

Latest commit

 

History

History
115 lines (91 loc) · 6.32 KB

README.md

File metadata and controls

115 lines (91 loc) · 6.32 KB

RNAInferenceTool

This is a Julia code repository for the paper:

@article {10.7554/eLife.82493,
article_type = {journal},
title = {Quantifying how post-transcriptional noise and gene copy number variation bias transcriptional parameter inference from mRNA distributions},
author = {Fu, Xiaoming and Patel, Heta P and Coppola, Stefano and Xu, Libin and Cao, Zhixing and Lenstra, Tineke L and Grima, Ramon},
editor = {Akhmanova, Anna},
volume = 11,
year = 2022,
month = {oct},
pub_date = {2022-10-17},
pages = {e82493},
citation = {eLife 2022;11:e82493},
doi = {10.7554/eLife.82493},
url = {https://doi.org/10.7554/eLife.82493},
journal = {eLife},
issn = {2050-084X},
publisher = {eLife Sciences Publications, Ltd},
}

This package can be installed through the Julia package manager:

using Pkg
Pkg.add(https://github.com/palmtree2013/RNAInferenceTool.jl)
using RNAInferenceTool

Note that the optimization function is a function wrapper of the adaptive differential evolution optimizer from BlackBoxOptim, that package is Copyright (c) 2013-2021: Robert Feldt.

A quick demo for fitting to the synthetic nascent signal data

Import data and true parameter set

Suppose we have a set of synthetic nascent RNA data from stochastic simulation algorithm generated from the delay telegraph model (check this example for details on how to generate synthetic data using DelaySSAToolkit). The delay telegraph model can be described as illustrate

Here we define the rate of switching from the active (ON) state to inactive (OFF) state as σ_off, the rate of switching from the OFF state to the ON state as σ_on. ρ_on represents the initiation rate when the gene state is ON. ρ_off represents the initiation rate when the gene state is OFF (leaky initiation rate), d is the detaching rate of polymerase from the gene, τ is the required time delay from the initiation to the completion of the transcription. In this case, ρ_off and d are both set to zero. Here L1 and L2 represent the PP7 862 bp (the linear increasing part of the fluorescence) and the gene of interest (plateau part of the fluorescence) GAL10 2200 bp respectively. The true parameter set is

# True parameters 
# σ_off, σ_on, ρ_on, ρ_off, d, τ
σ_off, σ_on, ρ_on, ρ_off, d, τ =  [1.0526,8.2034,57.989,0,0,0.5] 
# L1 =  signal fluorescence PP7 862 bp; L2 = GAL10 2200 bp 
L1 = 862; L2 = 2200;

# Here we import synthetic signal generated by delay telegraph model directly 
histo_synthetic = readdlm("test/histo_synthetic.txt")|>vec # in the test folder

Check the distribution

We can visualize the distribution of the synthetic nascent signal

scatter(convert_histo(histo_synthetic), labels="synthetic data", xlabel = "normalized signal intensity", ylabel = "frequency") # plot distribution

synthetic data

Inference

Set search range and inference configuration

We set the search range for each transcriptional parameter as follows

# For delay model the parameter order: σ_off, σ_on, ρ_on, ρ_off, d, τ
SRange = [(0.0,50.0),(0.0,50.0),(0.0,100.0),(0.0,0.0),(0.0,0.0),(τ,τ)]

# For telegraph model the parameter order: σ_off, σ_on, ρ_on, ρ_off, d(= 1/τ) 
SRange_tele = [(0.0,50.0),(0.0,50.0),(0.0,100.0),(0.0,0.0),(1/τ,1/τ)]

Next, we set the data structure for inference OptimStruct, which consists of the following elements:

  1. data: default type is histogram data (in this work and the following demo, we used this type of data); the other additional supported type is to use probability distribution.
  2. stage (cell phase): G1() or G2(); where G2() type data is inferred by convolution of two G1 stage distribution.
  3. dist: the distance function: Likelihood(), Likelihood_fusion(), Likelihood_rejection() and other distance functions in Distances.jl package are supported.
  4. model: telegraph model Telegraph(), delay telegraph model Delay(), and Poisson model Poisson() are supported.

Keyword arguments:

  1. infer_counts: Bool type, true if the inferred histogram data is the product count (e.g. the number of Pol II attached to the gene), false if the histogram data is the normalized signal intensity (e.g. signal affected by the PP7 Loop).
  2. L1 and L2: if infer_counts is set to false then L1 and L2 must be provided which represents the respective length of the linear part and plateau part of the trapezoid signal function respectively.
infer_struct = OptimStruct(histo_synthetic, G1(), Likelihood(), Delay(); infer_counts = false, L1 = 862, L2 =2200)
infer_struct_tele = OptimStruct(histo_synthetic, G1(), Likelihood(), Telegraph(); infer_counts = false, L1 = 862, L2 =2200)
estimated_params, distributions = optim_function(SRange, infer_struct, MaxFuncEvals = 10000)
estimated_params_tele, distributions_tele = optim_function(SRange_tele, infer_struct_tele, MaxFuncEvals = 10000)
scatter(distributions[:,2],labels="synthetic data", xlabel = "normalized signal intensity", ylabel = "frequency")
plot!([distributions[:,1] distributions_tele[:,1]],lines=(2, :dash),labels=["Delay Telegraph" "Telegraph"])

delaytele

Compare the parameters

DataFrame(True = params[1:3],Delay=estimated_params[1:3],Telegraph= estimated_params_tele[1:3])
True Delay Telegraph
σ_off 1.0526 0.969954 0.433565
σ_on 8.2034 7.78479 3.37541
ρ_on 57.989 57.6266 57.7762

Further use cases

Check the subfolders "mature" and "nascent" under "examples" folder for the numerical experiments described in the paper from mature and nascent mRNA data.