git

cmmid · Oct 20, 2024 · 16a2ef9 · 16a2ef9
1 parent 775e734
commit 16a2ef9
Show file tree

Hide file tree

Showing 2 changed files with 234 additions and 0 deletions.
diff --git a/vignettes/Example1.Rmd b/vignettes/Example1.Rmd
@@ -0,0 +1,218 @@
+---
+title: "Example 1: Simple"
+author: "David Hodgson"
+date: "`r Sys.Date()`"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Example 1: Simple}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r include=FALSE}
+
+knitr::opts_chunk$set(echo = TRUE)
+
+```
+
+```{r setup}
+#devtools::install("..") #install if needed
+library(ptmc)
+library(tidyverse) 
+library(coda)
+
+```
+
+# Parallel Tempering Monte Carlo
+
+## Model 
+
+The example model is taken from Ben Lambert's fantastic book, "A Student's Guide to Bayesian Statistics," [Questions 13.3](https://benlambertdotcom.files.wordpress.com/2018/08/bayesianbook_problemsanswers_final.pdf#page=124). Briefly, I assume the random count variable $X_t$, the number of mosquitos caught during day $t$, is Poisson, such that
+
+$$
+\begin{split}
+X_t &\sim \textit{Poisson}(\lambda)\\ 
+\lambda &= 1000 \times \exp(-\mu t)\psi
+\end{split}
+$$
+
+where $\mu$ is a constant mortality hazard rate and $\psi$ is the daily recapture probability with prior distributions of $\mu \sim \Gamma(2,20)$ and $\psi \sim \textit{Beta}(2, 40)$. The data $x_t$, the number of mosquitos caught on day $t$, are given in the RWM_mosquito.csv file but can also be found [here](https://benlambertdotcom.files.wordpress.com/2018/08/all_data.zip). 
+
+
+## Implementation  
+
+### Create the model
+The `model` list should be a list with three functions, `samplePriorDistributions`, `evaluateLogLikelihood`, and `evaluateLogPrior` and a vector string with the parameter names used in the calibration, `namesOfParameters`.
+
+#### vector string `par_names` 
+A vector string of the parameter names.
+
+#### func `samplePriorDistributions` 
+A function which generates the initial values (usually by sampling the prior distributions)
+
+* (no arguments)
+* @return vector of values 
+
+#### func `evaluateLogLikelihood` 
+A function which generates the log-likelihood for a set of parameter values
+
+* @param data, data needed to calculate the log-likelihood
+* @param params, parameter values
+* @return log-likelihood 
+
+#### func `evaluateLogPrior`
+A function which calculates the log-prior for a set of parameter values
+
+* @param params, parameter values
+* @return log-prior 
+
+
+```{r model definition}
+# model is a list of three functions and a vector string
+model <- list(
+
+  lowerParSupport_fitted = c(0, 0),
+  upperParSupport_fitted = c(10, 1),
+
+
+  namesOfParameters = c("mu","psi"),
+
+  samplePriorDistributions = function(datalist) {
+    c(rgamma(1, 2, 20), rbeta(1, 2, 40))
+  },
+
+  evaluateLogPrior = function(params, datalist) {
+    if (params[1] < 0 || params[2] < 0 || params[1] > 1 || params[2] > 1)
+      return(-Inf)
+
+    lpr <- 0
+    lpr <- lpr + log(pgamma( params[1], 2,20))
+    lpr <- lpr + log(pbeta( params[2], 2,40))
+    lpr
+  },
+
+  evaluateLogLikelihood = function(params, covariance, datalist) {
+    y <- datalist$y
+    mu <- params[1]
+    psi <- params[2]
+    ll <- 0;
+    for (t in 1:length(datalist$y))
+    {
+      sum_x = 0
+      lambda = 1000*exp(-mu*(t+1))*psi;
+      for (i in 1:y[t]){
+        sum_x <- sum_x + i
+      }
+      ll <- ll - lambda + y[t]*log(lambda) - sum_x
+    }
+    ll
+  }
+)
+
+```
+
+
+### Obtain the data
+Data used in the log-likelihood function can be in any format. 
+
+```{r data import}
+
+dataMos <- read.csv("RWM_mosquito.csv") 
+# restructure the data for the log-likelihood function
+data_t <-   list(
+  time = dataMos$time,
+  y = dataMos$recaptured
+)
+```
+
+### Settings
+The settings for the parallel tempering algorithm are summarised here.
+
+#### Settings Options
+
+* numberChainRuns, number of independent chains to run
+* numberTempChains, number of dependent chains per chain run (i.e. the number of rungs in the temperature ladder)
+* iterations, the number of steps to take in the Markov chain, (including the burn-in)
+* burninPosterior, the number of steps in the burn-in (these are discarded)
+* thin, thinning of the chain (i.e. =10 means only every 10th sample is saved)
+* consoleUpdates, frequency at which the console updates (i. =100 means every 100th step)
+* numberFittedPar, number of parameters
+* onAdaptiveCov, whether to include adaptive covariance
+* updatesAdaptiveCov, frequency at which the adaptive covariance matrix is updated
+* burninAdaptiveCov, number of steps to take before using the adaptive covariance matrix 
+* onAdaptiveTemp, whether to include adaptive temperature ladder
+* updatesAdaptiveTemp, frequency at which the adaptive temperature ladder is updated
+* Debug, run with debug output 
+
+```{r settings}
+
+# settings used for the ptmc model 
+settings <-  list(
+  numberChainRuns = 2,
+  numberTempChains = 10,
+  iterations = 10000,
+  burninPosterior = 5000,
+  thin = 10,
+  consoleUpdates = 100,
+  numberFittedPar = 2,
+  onAdaptiveCov = TRUE,
+  updatesAdaptiveCov = 200,
+  burninAdaptiveCov = 100,
+  onAdaptiveTemp = TRUE,
+  updatesAdaptiveTemp = 10,
+  onDebug = FALSE,
+  lowerParBounds = model$lowerParSupport_fitted,
+  upperParBounds = model$upperParSupport_fitted,
+  covarInitVal = 1e-8, # make very small if struggling to sample to beginning
+  covarInitValAdapt = 1e-4, # make very small if struggling to sample to beginning
+  covarMaxVal = 1, # decrease if struggling to sample in the middle
+  runParallel = TRUE,
+  numberCores = 2
+)
+```
+
+### Run the model.
+
+```{r run model,  message=FALSE, results = 'hide'}
+ 
+post <- ptmc_func(model=model, data=data_t, settings=settings)
+
+```
+
+## Plot the data 
+`ptmc_func` returns a list of length two. The first entry is `post$mcmc` a mcmc or mcmc.list object (from the coda package). I can plot these and calculate convergence diagnostics using coda functions:
+
+```{r plot outcomes}
+library(posterior)
+library(coda)
+library(bayesplot)
+summary(post$mcmc)
+
+post$mcmc %>% mcmc_trace
+
+# Plot the Gelman-Rubin diagnostic for the parameters
+gelman.plot(post$mcmc)
+gelman.diag(post$mcmc)
+
+```
+
+The second entry is `post$lpost` and is long table dataframe of the log-posterior values. These values can be easily plotted using ggplot2:
+```{r}
+# Plot of the logposterior for the three chains
+lpost_conv <- post$lpost %>% filter(sample_no>250)
+logpostplot <- ggplot(lpost_conv, aes(x = sample_no, y = lpost)) + 
+  geom_line(aes(color = chain_no), size = 0.2, alpha=0.8) +
+  theme_minimal()
+logpostplot
+
+```
+
+The third entry is `post$temp` and is long table dataframe of the adaptive temperature values. These values can be easily plotted using ggplot2:
+```{r}
+# Plot of the logposterior for the three chains
+#temp_conv <- post$temp %>% filter(sample_no>5000)
+tempplot <- ggplot(post$temp, aes(x = sample_no, y = temperature)) + 
+  geom_line(aes(color = chain_no), size = 0.2, alpha=0.8) +
+  theme_minimal()
+tempplot
+```
diff --git a/vignettes/RWM_mosquito.csv b/vignettes/RWM_mosquito.csv
@@ -0,0 +1,16 @@
+time,recaptured
+1,39
+2,21
+3,36
+4,34
+5,18
+6,22
+7,43
+8,11
+9,17
+10,9
+11,15
+12,11
+13,13
+14,13
+15,7