Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating epidist objects allowing flexibility for distribution parameters #112

Closed
CarmenTamayo opened this issue Dec 7, 2023 · 13 comments
Closed
Labels
help wanted Extra attention is needed

Comments

@CarmenTamayo
Copy link
Contributor

On PR #100 the usage of epiparameter is updated in the transmissibility report.
As a way to solve issue #109 , an epidist object is created so that epiparameter::discretise can be used also when the user is manually providing parameters and distribution of the SI.

To my knowledge, when creating an epidist, it's necessary to specify the names of the parameters of the distribution, ie if it's gamma, it's necessary to type prob_distribution_params = c(shape = x, scale = x).

However the template requires these parameters to be flexible, as the distribution can be provided by the user as well...
I've been trying out different ways to do this but can't figure it out, I'm looking for a way to, instead of specifying shape/scale or meanlog/sdlog, do something like prob_distribution_parameters = c(param1, param2), that can be applied to all different distributions.

@joshwlambert is there an easy way to do this with epiparameter? or would we have to write a function for this?

Thank you!

@CarmenTamayo CarmenTamayo added the help wanted Extra attention is needed label Dec 7, 2023
@joshwlambert
Copy link
Member

I will have to give this a bit of thought but my initial guess is this cannot be handled by {epiparameter}. The package handles variable input parameters for each distribution and can standardise these (e.g. a user can input shape and rate for a gamma and {epiparameter} can convert to shape and scale). So if the user (in this case the {episoap} user) doesn't provide parameter names it does not know if they are shape and rate or shape and scale (in the case of the gamma distribution).

You could document that the parameters input by the user need to be of a certain parameterisation, but this would be needed for each distribution accepted by {episoap} which is far from ideal and not something I would advise.

Happy to take a look into this next week and try and find a nice solution.

@CarmenTamayo
Copy link
Contributor Author

Thanks Josh, I agree that that wouldn't be a good solution, let me know when you have had a thought and I'll update the script

@Bisaloo
Copy link
Member

Bisaloo commented Dec 11, 2023

At the moment, if users provide their own distributions, they are expected to pass family, mean and sd, which epiparameter should be able to handle fine.

Do you have a specific example of code that you anticipate being problematic?

@CarmenTamayo
Copy link
Contributor Author

CarmenTamayo commented Dec 11, 2023

The issue is at the point where epidist objects are created using epiparameter. When using this function, it's necessary to specify not only the distribution (which is included in the parameter list with params$si_dist) but also the specific name of the distribution parameters, e.g., shape and scale or meanlog and sdlog, which means that this function cannot be generalised for any distribution provided by the user at the moment, e.g., in the example I provided in the issue I raised initially:
epidist(
disease = params$disease_name,
epi_dist = "serial_interval",
prob_distribution = params$si_dist,
prob_distribution_params = c(shape = si_params[1], scale = si_params[2]) <- here if the distribution was not a gamma when using params$si_dist but a lnorm, there is not a way to account for this and an error is generated

As suggested, something like prob_distribution_params = c(param1 = si_params[1], param2 = si_params[2]) is what I was looking for, but this is not possible with epiparameter, as Josh commented

@CarmenTamayo
Copy link
Contributor Author

CarmenTamayo commented Dec 11, 2023

This also happens for instance in the EpiNow2 chunk, when converting distribution parameters to summary statistics , with epiparameter this would be: convert_params_to_summary_stats(distribution = "gamma", shape = si[1] , scale = si[2] ), where distribution = params$si_dist could be used, but the shape and scale cannot be changed to a generic form, e.g., param1 = si[1] , param2 = si[2] bc the package requires these to be named with the specific name for each distribution

@Bisaloo
Copy link
Member

Bisaloo commented Dec 11, 2023

The issue is at the point where epidist objects are created using epiparameter. When using this function, it's necessary to specify not only the distribution (which is included in the parameter list with params$si_dist) but also the specific name of the distribution parameters, e.g., shape and scale or meanlog and sdlog, which means that this function cannot be generalised for any distribution provided by the user at the moment, e.g., in the example I provided in the issue I raised initially: epidist( disease = params$disease_name, epi_dist = "serial_interval", prob_distribution = params$si_dist, prob_distribution_params = c(shape = si_params[1], scale = si_params[2]) <- here if the distribution was not a gamma when using params$si_dist but a lnorm, there is not a way to account for this and an error is generated

As suggested, something like prob_distribution_params = c(param1 = si_params[1], param2 = si_params[2]) is what I was looking for, but this is not possible with epiparameter, as Josh commented

The solution for this should be to not convert the parameters beforehand but pass directly the mean and sd to epidist() and let epiparameter deal with conversions internally if necessary.

This means we know that we always have sd and mean and can pass them as names.

@CarmenTamayo
Copy link
Contributor Author

Thanks Hugo, that's great, I'm now using this instead:

si_epidist <- epidist( disease = params$disease_name, epi_dist = "serial_interval", prob_distribution = params$si_dist, summary_stats = create_epidist_summary_stats(mean = params$si_mean,sd=params$si_sd), auto_calc_params = TRUE)

In the case of converting parameters to summary statistics, like I mentioned here:

This also happens for instance in the EpiNow2 chunk, when converting distribution parameters to summary statistics , with epiparameter this would be: convert_params_to_summary_stats(distribution = "gamma", shape = si[1] , scale = si[2] ), where distribution = params$si_dist could be used, but the shape and scale cannot be changed to a generic form, e.g., param1 = si[1] , param2 = si[2] bc the package requires these to be named with the specific name for each distribution

What do you think would be the best solution?

@Bisaloo
Copy link
Member

Bisaloo commented Dec 11, 2023

For now, you can use do.call(), which allows to pass arbitrary named value to a function:

library(epiparameter)

# This returns a lnorm dist
edist <- epidist_db(
  disease = "COVID-19",
  epi_dist = "serial interval",
  author = "Nishiura",
  single_epidist = TRUE
)
#> Using Nishiura H, Linton N, Akhmetzhanov A (2020). "Serial interval of novel
#> coronavirus (COVID-19) infections." _International Journal of
#> Infectious Diseases_. doi:10.1016/j.ijid.2020.02.060
#> <https://doi.org/10.1016/j.ijid.2020.02.060>.. 
#> To retrieve the short citation use the 'get_citation' function
family(edist)
#> [1] "lnorm"

do.call(
  convert_params_to_summary_stats,
  c(
    distribution = family(edist),
    as.list(get_parameters(edist))
  )
)
#> $mean
#> [1] 4.7
#> 
#> $median
#> [1] 3.999869
#> 
#> $mode
#> [1] 2.896954
#> 
#> $var
#> [1] 8.41
#> 
#> $sd
#> [1] 2.9
#> 
#> $cv
#> [1] 0.6170213
#> 
#> $skewness
#> [1] 2.085973
#> 
#> $ex_kurtosis
#> [1] 8.617709

# This returns a gamma dist
edist <- epidist_db(
  disease = "ebola",
  epi_dist = "serial interval",
  single_epidist = TRUE
)
#> Using WHO Ebola Response Team, Agua-Agum J, Ariyarajah A, Aylward B, Blake I,
#> Brennan R, Cori A, Donnelly C, Dorigatti I, Dye C, Eckmanns T, Ferguson
#> N, Formenty P, Fraser C, Garcia E, Garske T, Hinsley W, Holmes D,
#> Hugonnet S, Iyengar S, Jombart T, Krishnan R, Meijers S, Mills H,
#> Mohamed Y, Nedjati-Gilani G, Newton E, Nouvellet P, Pelletier L,
#> Perkins D, Riley S, Sagrado M, Schnitzler J, Schumacher D, Shah A, Van
#> Kerkhove M, Varsaneux O, Kannangarage N (2015). "West African Ebola
#> Epidemic after One Year — Slowing but Not Yet under Control." _The New
#> England Journal of Medicine_. doi:10.1056/NEJMc1414992
#> <https://doi.org/10.1056/NEJMc1414992>.. 
#> To retrieve the short citation use the 'get_citation' function
family(edist)
#> [1] "gamma"

do.call(
  convert_params_to_summary_stats,
  c(
    distribution = family(edist),
    as.list(get_parameters(edist))
  )
)
#> $mean
#> [1] 14.2
#> 
#> $median
#> [1] 0.2873807
#> 
#> $mode
#> [1] 7.709859
#> 
#> $var
#> [1] 92.16
#> 
#> $sd
#> [1] 9.6
#> 
#> $cv
#> [1] 0.6760563
#> 
#> $skewness
#> [1] 1.352113
#> 
#> $ex_kurtosis
#> [1] 2.742313

Created on 2023-12-11 with reprex v2.0.2

@CarmenTamayo
Copy link
Contributor Author

Thanks Hugo, I'll add it to the template for now

@joshwlambert
Copy link
Member

Thanks @Bisaloo for the solutions, I agree with your suggestions to use epidist() to do the heavily lifting where possible and use do.call() for arbitrarily named vectors.

@CarmenTamayo can this issue be closed?

@CarmenTamayo
Copy link
Contributor Author

Thank you both, I'm going to leave the issue open until I make all the changes and make sure the do.call() function works

@sbfnk
Copy link

sbfnk commented Dec 15, 2023

The solution for this should be to not convert the parameters beforehand but pass directly the mean and sd to epidist() and let epiparameter deal with conversions internally if necessary.

It might not matter in this case but just to mention that this will only work as long as the parameters don't have uncertainty (in which case the conversion is no longer straightforward).

@CarmenTamayo
Copy link
Contributor Author

Waiting to close #113 to solve this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants