Allow analysis scripts to read data.json #239

WardBrian · 2024-11-04T16:44:46Z

Closes #238.

Adds data to the latestRun object
Adds a files argument to the pyodide and webr mechanisms to allow us to populate arbitrary files

This sets the SIR analysis.R example go from

# posterior predictive check using the pred_cases generated quantity

install.packages(c("outbreaks", "bayesplot"))
library(outbreaks)
library(posterior)
library(ggplot2)

# same as data generation
cases <- influenza_england_1978_school$in_bed
n_days <- length(cases)
ts <- 1:n_days

# Extract posterior predictive checks
pred_cases <- as.matrix(as_draws_df(as_draws_rvars(draws)$pred_cases))[, -(15:17)]

bayesplot::ppc_ribbon(y = cases, yrep = pred_cases,
                      x = ts, y_draw = "point") +
  theme_bw() +
  ylab("cases") + xlab("days")

to

# posterior predictive check using the pred_cases generated quantity

install.packages("bayesplot")
library(outbreaks)
library(posterior)
library(ggplot2)

# load from data
d <- jsonlite::read_json('./data.json')
cases <- unlist(d$cases)
n_days <- d$n_days
ts <- unlist(d$ts)

# Extract posterior predictive checks
pred_cases <- as.matrix(as_draws_df(as_draws_rvars(draws)$pred_cases))[, -(15:17)]

bayesplot::ppc_ribbon(y = cases, yrep = pred_cases,
                      x = ts, y_draw = "point") +
  theme_bw() +
  ylab("cases") + xlab("days")

This will be even nicer if the data has some randomization to it, in which case re-running the same code in analysis would not recover the same data, but this would allow it to

jsoules

So as I understand this, the crucial change is that the contents of the data.json window will be sent to the language-appropriate worker thread as part of invoking the analysis script, and the worker thread will write that content to a virtual filesystem.

This will make the content available to the internals of the analysis script through the same mechanism as executing the analysis script on a local machine where the data file is locally available.

The code looks good and I've confirmed it works for the Python analysis script attached to the SIR model example. (It's not clear to me if the current version of the R analysis file is using the FS-based data.json or not.)

I wonder if creating an additional copy of an in-memory JSON data file could further tax the scarce memory resource in the case of models with very large data, but the increased utility is probably worth the risk in this case. Especially as the copy doesn't have to exist until after the sampler's been run.

I think we are good to move forward here.

WardBrian · 2024-11-04T17:17:36Z

I wonder if creating an additional copy of an in-memory JSON data file could further tax the scarce memory resource in the case of models with very large data, but the increased utility is probably worth the risk in this case. Especially as the copy doesn't have to exist until after the sampler's been run.

If this does become a problem, I think we could work around it by using FS.createLazyFile, but this would require extra machinery to 'host' the file at a URL, so I didn't tackle it here. If you know a good way, we definitely could do that sooner rather than later

jsoules · 2024-11-04T17:44:24Z

To be clear--yeah, I don't have an answer here, or even evidence that it's going to be a problem; I suspect any such situation is either massively over-provisioned with data or is going to run into problems while still in the sampler phase, so realistically I'm not worried about it.

We can solve it if it's ever an issue.

Allow analysis scripts to read data.json

107bf5e

WardBrian added the feature label Nov 4, 2024

Update example scripts

4733e23

jsoules approved these changes Nov 4, 2024

View reviewed changes

WardBrian merged commit 7b7e63d into main Nov 4, 2024
2 checks passed

WardBrian deleted the analysis-data.json branch November 4, 2024 17:26

WardBrian restored the analysis-data.json branch November 4, 2024 17:28

WardBrian deleted the analysis-data.json branch November 4, 2024 17:28

WardBrian mentioned this pull request Nov 13, 2024

Should we allow users to upload files for use in data and analysis scripts? #246

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow analysis scripts to read data.json #239

Allow analysis scripts to read data.json #239

WardBrian commented Nov 4, 2024

jsoules left a comment

WardBrian commented Nov 4, 2024

jsoules commented Nov 4, 2024

Allow analysis scripts to read data.json #239

Allow analysis scripts to read data.json #239

Conversation

WardBrian commented Nov 4, 2024

jsoules left a comment

Choose a reason for hiding this comment

WardBrian commented Nov 4, 2024

jsoules commented Nov 4, 2024