From 42c4a0c899a4c5fd6e710ea9fe2834ab28d2286f Mon Sep 17 00:00:00 2001 From: Quarto GHA Workflow Runner Date: Wed, 4 Dec 2024 17:04:03 +0000 Subject: [PATCH] Built site for gh-pages --- .nojekyll | 2 +- blog-posts.xml | 2 +- blog-r package.xml | 2 +- blog-r.xml | 2 +- blog.html | 103 +- blog.xml | 275 ++-- index.html | 20 +- listings.json | 1 + posts/statistical-correctness/index.html | 2 +- posts/vaccineff_v1.0.0/index.html | 804 +++++++++++ presentations.html | 34 +- resources.html | 42 +- search.json | 1630 +++++++++++----------- sitemap.xml | 276 ++-- 14 files changed, 2092 insertions(+), 1103 deletions(-) create mode 100644 posts/vaccineff_v1.0.0/index.html diff --git a/.nojekyll b/.nojekyll index ea7c6fce..33ea4f1c 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -946cadea \ No newline at end of file +56308372 \ No newline at end of file diff --git a/blog-posts.xml b/blog-posts.xml index 65669919..6cf67a62 100644 --- a/blog-posts.xml +++ b/blog-posts.xml @@ -10,6 +10,6 @@ A place for Epiverse-TRACE developers to share their reflections, learnings, and showcase their work. quarto-1.7.2 -Wed, 04 Dec 2024 12:43:03 GMT +Wed, 04 Dec 2024 17:03:49 GMT diff --git a/blog-r package.xml b/blog-r package.xml index c654708e..aa566e2c 100644 --- a/blog-r package.xml +++ b/blog-r package.xml @@ -4719,7 +4719,7 @@ font-style: inherit;">$p_infected)

Conclusion: automated validation vs peer-review

In this post, we’ve presented different methods to automatically verify the statistical correctness of your statistical software. We would like to highlight one more time that it’s important to run these tests are part of your regular integration system, instead of running them just once at the start of the development. This will prevent the addition of possible errors in the code and show users what specific checks you are doing. By doing so, you are transparently committing to the highest quality.

Multiple voices in the community are pushing more towards peer-review as a proxy for quality and validity:

-

+

We would like to highlight that automated validation and peer review are not mutually exclusive and answer slightly different purposes.

On the one hand, automated validation fails to catch more obscure bugs and edge cases. For example, a bug that would be difficult to detect via automated approach is the use of bad Random Number Generators when running in parallel.

But on the other hand, peer-review is less scalable, and journals usually have some editorial policy that might not make your package a good fit. Additionally, peer-review usually happens at one point in time while automated validation can, and should, be part of the continuous integration system.

diff --git a/blog-r.xml b/blog-r.xml index 12713193..6ea929de 100644 --- a/blog-r.xml +++ b/blog-r.xml @@ -5742,7 +5742,7 @@ font-style: inherit;">$p_infected)

Conclusion: automated validation vs peer-review

In this post, we’ve presented different methods to automatically verify the statistical correctness of your statistical software. We would like to highlight one more time that it’s important to run these tests are part of your regular integration system, instead of running them just once at the start of the development. This will prevent the addition of possible errors in the code and show users what specific checks you are doing. By doing so, you are transparently committing to the highest quality.

Multiple voices in the community are pushing more towards peer-review as a proxy for quality and validity:

-

+

We would like to highlight that automated validation and peer review are not mutually exclusive and answer slightly different purposes.

On the one hand, automated validation fails to catch more obscure bugs and edge cases. For example, a bug that would be difficult to detect via automated approach is the use of bad Random Number Generators when running in parallel.

But on the other hand, peer-review is less scalable, and journals usually have some editorial policy that might not make your package a good fit. Additionally, peer-review usually happens at one point in time while automated validation can, and should, be part of the continuous integration system.

diff --git a/blog.html b/blog.html index ac7ef284..5f58e7c1 100644 --- a/blog.html +++ b/blog.html @@ -269,7 +269,7 @@

Blog

+
Categories
All (35)
100 days workshop (1)
AI (1)
Copilot Workspace (1)
DOI (6)
Epiverse-TRACE (1)
GitHub (1)
R (15)
R package (11)
Rcpp (2)
S3 (4)
Team (1)
benchmarking (1)
cli (1)
code quality (2)
code sharing (1)
community (1)
continuous integration (3)
data frame (1)
ecosystem (1)
functional programming (1)
good practices (3)
interoperability (5)
iteration (1)
lifecycle (1)
lifecycles (1)
mpox (1)
new-release (17)
object orientation (3)
object-oriented programming (1)
open-source (2)
outbreak (1)
outbreak analytics (1)
outbreak-analytics (1)
package development (3)
participatory research (1)
progressive enhancement (1)
readability (1)
renv (1)
reproducibility (1)
software architecture (1)
software development (1)
system dependencies (1)
testing (1)
tidyverse (2)
tutorials (1)
versioning (1)
@@ -285,7 +285,40 @@
Categories
-
+
+
+

+
 
+

+
+ + +
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+

diff --git a/blog.xml b/blog.xml index 673847b5..20db4e39 100644 --- a/blog.xml +++ b/blog.xml @@ -10,7 +10,203 @@ A place for Epiverse-TRACE developers to share their reflections, learnings, and showcase their work. quarto-1.7.2 -Thu, 21 Nov 2024 00:00:00 GMT +Mon, 02 Dec 2024 00:00:00 GMT + + vaccineff v1.0.0 + The Epiverse-TRACE development team + https://epiverse-trace.github.io/posts/vaccineff_v1.0.0/ + We are very excited to announce the release of a new
vaccineff version v1.0.0. The package offers tools for estimating vaccine effectiveness (VE) and related metrics, with features such as:

+
    +
  • Estimation in a cohort study design via the Cox model for Proportional Hazards
    +
  • +
  • Controlling for confounders through an iterative matching algorithm
    +
  • +
  • A detailed vignette with illustrative examples
  • +
+

Two main functions compose {vaccineff 1.0.0}:

+
    +
  1. make_vaccineff_data(): Constructs the vaccineff_data class, managing key features for preparing, visualizing, and organizing cohort data.

  2. +
  3. estimate_vaccineff(): Constructs the vaccineff class, enabling estimation of VE and assessment of results and model performance.

  4. +
+

Both classes support plot() and summary() methods, providing visualizations and detailed reports.

+
+

Breaking Changes

+

{vaccineff 1.0.0} refactors the package’s internal structure for better maintainability.

+
    +
  • estimate_vaccineff() replaces effectiveness().

    +
      +
    • It returns an object of class vaccineff.
    • +
    • The at parameter must always be provided for accurate results.
    • +
  • +
  • plot.vaccineff_data() replaces plot_coverage().

  • +
  • cohortdata has been simplified and reduced to improve examples and reduce computation time.

  • +
+
+
+

Quick start

+

To start with {vaccineff} 1.0.0, run the following example. This example estimates VE using the package’s example dataset, adjusted for confounders (age and sex) via the iterative matching strategy.

+

The summary() method for the vaccineff_data object provides details on cohort balance and composition before and after applying the matching strategy, as well as general cohort features.

+

When called on the vaccineff object (ve), the summary() method returns the estimation summary, including results from the Schoenfeld test for proportional hazards.

+
+
library(vaccineff)
+
+# Create `vaccineff_data`
+data(cohortdata)
+vaccineff_data <- make_vaccineff_data(
+  data_set = cohortdata,
+  outcome_date_col = "death_date",
+  censoring_date_col = "death_other_causes",
+  vacc_date_col = "vaccine_date_2",
+  vaccinated_status = "v",
+  unvaccinated_status = "u",
+  immunization_delay = 15,
+  end_cohort = as.Date("2021-12-31"),
+  match = TRUE,
+  exact = "sex",
+  nearest = c(age = 1)
+)
+summary(vaccineff_data)
+
+
Cohort start:  2021-03-26
+Cohort end:  2021-12-31
+The start date of the cohort was defined as the mininimum immunization date. 
+65 registers were removed with outcomes before the start date.
+
+Nearest neighbors matching iteratively performed.
+Number of iterations:  4
+Balance all:
+              u         v         smd
+age   63.917069 62.997438 -0.08593156
+sex_F  0.520277  0.573474  0.10701746
+sex_M  0.479723  0.426526 -0.10701746
+
+Balance matched:
+               u          v         smd
+age   63.9014396 63.7280362 -0.01587567
+sex_F  0.5206718  0.5206718  0.00000000
+sex_M  0.4793282  0.4793282  0.00000000
+
+Summary vaccination:
+              u     v
+All       10973 19905
+Matched   10836 10836
+Unmatched   137  9069
+
+// tags: outcome_date_col:death_date, censoring_date_col:death_other_causes, vacc_date_col:vaccine_date_2, immunization_date_col:immunization_date, vacc_status_col:vaccine_status 
+
+
# Estimate VE
+ve <- estimate_vaccineff(vaccineff_data, at = 180)
+summary(ve)
+
+
Vaccine Effectiveness at 180 days computed as VE = 1 - HR:
+     VE lower.95 upper.95
+ 0.7254   0.5437   0.8347
+
+Schoenfeld test for Proportional Hazards assumption:
+p-value = 0.1507
+
+
+
+
+

Acknowledgements

+

@Bisaloo @chartgerink @davidsantiagoquevedo @ErikaCantor @fsabrilb @GeraldineGomez @jpavlich @ManuelVelarde2212 @pratikunterwegs @zmcucunuba

+ + +
+ +

Reuse

Citation

BibTeX citation:
@online{epiverse-trace_development_team2024,
+  author = {Epiverse-TRACE development team, The},
+  title = {Vaccineff V1.0.0},
+  date = {2024-12-02},
+  url = {https://epiverse-trace.github.io/posts/vaccineff_v1.0.0/},
+  langid = {en}
+}
+
For attribution, please cite this work as:
+Epiverse-TRACE development team, The. 2024. “Vaccineff +V1.0.0.” December 2, 2024. https://epiverse-trace.github.io/posts/vaccineff_v1.0.0/. +
]]> + new-release + https://epiverse-trace.github.io/posts/vaccineff_v1.0.0/ + Mon, 02 Dec 2024 00:00:00 GMT + epiparameter v0.3.0 The Epiverse-TRACE development team @@ -2223,82 +2419,5 @@ Renv.” January 31, 2024. https://epiverse-trace.github.io/posts/renv-complications/ Wed, 31 Jan 2024 00:00:00 GMT - - superspreading v0.2.0 - The Epiverse-TRACE development team - https://epiverse-trace.github.io/posts/superspreading_v0.2.0/ - We are very excited to announce the release of a new superspreading version v0.2.0. Here is an automatically generated summary of the changes in this version.

-

Second minor release of superspreading. This release enhances functions added in v0.1.0 and adds two new exported functions, and two new vignettes.

-
-
library(superspreading)
-
-
-

New features

-
-
-
-

Breaking changes

-
    -
  • ind_control and pop_control arguments replace control and control_type arguments in probability_contain(); and the argument default for num_init_infect is removed (#70).
  • -
  • Changed epidist argument to offspring_dist to follow Epiverse style (affects several functions) (#64).
  • -
  • Argument in proportion_transmission() has been renamed from sim to simulate (#81).
  • -
-
-
-

Minor changes

-
    -
  • New package hex logo (#73).
  • -
  • Updated continuous integration and package infrastructure (#67).
  • -
  • Improved function documentation (#63).
  • -
  • Optimisation now uses optim() by default (#82).
  • -
  • Testing suite now uses snapshot testing for regression tests (#84).
  • -
-
-
-

Bug fixes

-
    -
  • None to {superspreading} functions.
  • -
  • Update {epiparameter} use in vignette and tests (#62).
  • -
-
-
-

Deprecated and defunct

-
    -
  • None
  • -
- - -
- -

Reuse

Citation

BibTeX citation:
@online{epiverse-trace_development_team2024,
-  author = {Epiverse-TRACE development team, The},
-  title = {Superspreading V0.2.0},
-  date = {2024-01-29},
-  url = {https://epiverse-trace.github.io/posts/superspreading_v0.2.0/},
-  langid = {en}
-}
-
For attribution, please cite this work as:
-Epiverse-TRACE development team, The. 2024. “Superspreading -V0.2.0.” January 29, 2024. https://epiverse-trace.github.io/posts/superspreading_v0.2.0/. -
]]> - new-release - https://epiverse-trace.github.io/posts/superspreading_v0.2.0/ - Mon, 29 Jan 2024 00:00:00 GMT - diff --git a/index.html b/index.html index ae396bce..0a3a6f52 100644 --- a/index.html +++ b/index.html @@ -210,12 +210,12 @@
-

+

-

-

-

-

+

+

+

+

+
+ + +
+
+ + + +
+
+
+

vaccineff v1.0.0

+
+
new-release
+
+
+
+ + +
+ +
+
Author
+
+

The Epiverse-TRACE development team

+
+
+ +
+
Published
+
+

December 2, 2024

+
+
+ + +
+ + +
+ + + + +
+ + + + + +

We are very excited to announce the release of a new vaccineff version v1.0.0. The package offers tools for estimating vaccine effectiveness (VE) and related metrics, with features such as:

+
    +
  • Estimation in a cohort study design via the Cox model for Proportional Hazards
    +
  • +
  • Controlling for confounders through an iterative matching algorithm
    +
  • +
  • A detailed vignette with illustrative examples
  • +
+

Two main functions compose {vaccineff 1.0.0}:

+
    +
  1. make_vaccineff_data(): Constructs the vaccineff_data class, managing key features for preparing, visualizing, and organizing cohort data.

  2. +
  3. estimate_vaccineff(): Constructs the vaccineff class, enabling estimation of VE and assessment of results and model performance.

  4. +
+

Both classes support plot() and summary() methods, providing visualizations and detailed reports.

+
+

Breaking Changes

+

{vaccineff 1.0.0} refactors the package’s internal structure for better maintainability.

+
    +
  • estimate_vaccineff() replaces effectiveness().

    +
      +
    • It returns an object of class vaccineff.
    • +
    • The at parameter must always be provided for accurate results.
    • +
  • +
  • plot.vaccineff_data() replaces plot_coverage().

  • +
  • cohortdata has been simplified and reduced to improve examples and reduce computation time.

  • +
+
+
+

Quick start

+

To start with {vaccineff} 1.0.0, run the following example. This example estimates VE using the package’s example dataset, adjusted for confounders (age and sex) via the iterative matching strategy.

+

The summary() method for the vaccineff_data object provides details on cohort balance and composition before and after applying the matching strategy, as well as general cohort features.

+

When called on the vaccineff object (ve), the summary() method returns the estimation summary, including results from the Schoenfeld test for proportional hazards.

+
+
library(vaccineff)
+
+# Create `vaccineff_data`
+data(cohortdata)
+vaccineff_data <- make_vaccineff_data(
+  data_set = cohortdata,
+  outcome_date_col = "death_date",
+  censoring_date_col = "death_other_causes",
+  vacc_date_col = "vaccine_date_2",
+  vaccinated_status = "v",
+  unvaccinated_status = "u",
+  immunization_delay = 15,
+  end_cohort = as.Date("2021-12-31"),
+  match = TRUE,
+  exact = "sex",
+  nearest = c(age = 1)
+)
+summary(vaccineff_data)
+
+
Cohort start:  2021-03-26
+Cohort end:  2021-12-31
+The start date of the cohort was defined as the mininimum immunization date. 
+65 registers were removed with outcomes before the start date.
+
+Nearest neighbors matching iteratively performed.
+Number of iterations:  4
+Balance all:
+              u         v         smd
+age   63.917069 62.997438 -0.08593156
+sex_F  0.520277  0.573474  0.10701746
+sex_M  0.479723  0.426526 -0.10701746
+
+Balance matched:
+               u          v         smd
+age   63.9014396 63.7280362 -0.01587567
+sex_F  0.5206718  0.5206718  0.00000000
+sex_M  0.4793282  0.4793282  0.00000000
+
+Summary vaccination:
+              u     v
+All       10973 19905
+Matched   10836 10836
+Unmatched   137  9069
+
+// tags: outcome_date_col:death_date, censoring_date_col:death_other_causes, vacc_date_col:vaccine_date_2, immunization_date_col:immunization_date, vacc_status_col:vaccine_status 
+
+
# Estimate VE
+ve <- estimate_vaccineff(vaccineff_data, at = 180)
+summary(ve)
+
+
Vaccine Effectiveness at 180 days computed as VE = 1 - HR:
+     VE lower.95 upper.95
+ 0.7254   0.5437   0.8347
+
+Schoenfeld test for Proportional Hazards assumption:
+p-value = 0.1507
+
+
+
+
+

Acknowledgements

+

@Bisaloo @chartgerink @davidsantiagoquevedo @ErikaCantor @fsabrilb @GeraldineGomez @jpavlich @ManuelVelarde2212 @pratikunterwegs @zmcucunuba

+ + +
+ +

Reuse

Citation

BibTeX citation:
@online{epiverse-trace_development_team2024,
+  author = {Epiverse-TRACE development team, The},
+  title = {Vaccineff V1.0.0},
+  date = {2024-12-02},
+  url = {https://epiverse-trace.github.io/posts/vaccineff_v1.0.0/},
+  langid = {en}
+}
+
For attribution, please cite this work as:
+Epiverse-TRACE development team, The. 2024. “Vaccineff +V1.0.0.” December 2, 2024. https://epiverse-trace.github.io/posts/vaccineff_v1.0.0/. +
+ + + + +
+ + + + + \ No newline at end of file diff --git a/presentations.html b/presentations.html index 766e7353..685b8328 100644 --- a/presentations.html +++ b/presentations.html @@ -285,7 +285,7 @@
Categories
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+

diff --git a/resources.html b/resources.html index 3a205205..04bffd29 100644 --- a/resources.html +++ b/resources.html @@ -309,7 +309,7 @@
Categories
-
+

The True ROI of Doing It Right the First Time in Software Projects

@@ -333,7 +333,7 @@

+

Painless introduction to object-oriented programming using S3 classes

@@ -359,7 +359,7 @@

+

Tidyverse book on code review

@@ -383,7 +383,7 @@

+

How to Run GitHub Actions Locally Using the act CLI Tool

@@ -409,7 +409,7 @@

+

Quick example of how to use `tryCatch()`

@@ -433,7 +433,7 @@

+

Using data.table

@@ -455,7 +455,7 @@

+

Deep R Programming

@@ -479,7 +479,7 @@

+

Rcpp for everyone

@@ -503,7 +503,7 @@

+

Tutorial on the types of tests to write for an R function

@@ -527,7 +527,7 @@

+

shinystan - Interactive diagnostics and posterior analysis for Bayesian models

@@ -549,7 +549,7 @@

+

Explore about 2728 R color palettes

@@ -571,7 +571,7 @@

+

Difference between the `apply()` family of functions in R

@@ -595,7 +595,7 @@

+

Why You Should (or Shouldn’t) Build an API Client?

@@ -617,7 +617,7 @@

+

Why I don’t add style options to my ggplot2 functions

@@ -641,7 +641,7 @@

+

Request and handle API response from R

@@ -665,7 +665,7 @@

+

Epimodelac2023 - Outbreak Analysis and Modeling in Public Health

@@ -689,7 +689,7 @@

+

Rcpp gallery

@@ -715,7 +715,7 @@

+

Checks and automation early feedback via local commit hooks and the precommit package

@@ -739,7 +739,7 @@

+

Stan Playground - A web-based platform for writing and compiling stan programs

@@ -761,7 +761,7 @@

+

CVE-2024-27322 Should Never Have Been Assigned And R Data Files Are Still Super Risky Even In R 4.4.0

@@ -783,7 +783,7 @@

+

Epirecipes cookbook

diff --git a/search.json b/search.json index 1cfc8b56..dd101b53 100644 --- a/search.json +++ b/search.json @@ -273,522 +273,620 @@ "text": "Footnotes\n\n\nNote that we focus here on the S3 framework but R has other object orientation frameworks, as discussed in the relevant section of the ‘Advanced R’ book by Hadley Wickham↩︎\nEvery rule has its exceptions though such as the generics package, built by prominent members of the R developer community, which overloads base R functions such as as.factor() or as.difftime().↩︎" }, { - "objectID": "posts/cleanepi_v1.0.0/index.html", - "href": "posts/cleanepi_v1.0.0/index.html", - "title": "cleanepi v1.0.0", + "objectID": "posts/100days-workshop/index.html", + "href": "posts/100days-workshop/index.html", + "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", "section": "", - "text": "We are very excited to announce the release of a new cleanepi version v1.0.0. Here is an automatically generated summary of the changes in this version." + "text": "This vignette summarises the findings from the 100 days and 100 lines of code workshop, hosted in December 2022 by Epiverse-TRACE.\nTo discuss how the first 100 lines of code during an epidemic should look like, we invited 40 experts, including academics, field epidemiologists, and software engineers, to take part in a 3-day workshop, where they discussed the current challenges, and potential solutions, in data analytic pipelines used to analyse epidemic data. In addition to highlighting existing technical solutions and their use cases, presentations on best practices in fostering collaboration across institutions and disciplines set the scene for the subsequent workshop scenario exercises." }, { - "objectID": "posts/cleanepi_v1.0.0/index.html#first-major-release", - "href": "posts/cleanepi_v1.0.0/index.html#first-major-release", - "title": "cleanepi v1.0.0", - "section": "first major release", - "text": "first major release\nInitial CRAN submission of {cleanepi}, an R package designed for cleaning, curating, and standardizing epidemiological data.\nThis release includes the following key functionalities:\n\nRemoving irregularities: It removes duplicated and empty rows and columns, as well as columns with constant values.\nHandling missing values: It replaces missing values with the standard NA format, ensuring consistency and ease of analysis.\nEnsuring data integrity: It ensures the uniqueness of uniquely identified columns, thus maintaining data integrity and preventing duplicates.\nDate conversion: It offers functionality to convert character columns to Date format under specific conditions, enhancing data uniformity and facilitating temporal analysis. It also offers conversion of numeric values written in letters into numbers.\nStandardizing entries: It can standardize column entries into specified formats, promoting consistency across the dataset.\nTime span calculation: It calculates the time span between two elements of type Date, providing valuable demographic insights for epidemiological analysis.\nReporting cleaning outcome: It displays the report that reflects the changes made on the input data during the cleaning process, hence allowing the user to amend on the cleaning operations." + "objectID": "posts/100days-workshop/index.html#scenario-1-novel-respiratory-disease-in-the-gambia", + "href": "posts/100days-workshop/index.html#scenario-1-novel-respiratory-disease-in-the-gambia", + "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", + "section": "Scenario 1: Novel respiratory disease in The Gambia", + "text": "Scenario 1: Novel respiratory disease in The Gambia\n\n\n\nScenario 1 details\n\n\n\nAnalytic pipeline for scenario 1 (analysed by group 2)\n\nData cleaning\n\nlinelist to standardise date format\ncleanr from previous Hackathon\n\nDelay distributions\n\nfitdisrplus to fit parameteric distributions to scenario data\nepiparameter to extract delay distributions from respiratory pathogens\nEpiNow2 to fit reporting delays\nEpiEstim / coarseDataTools to estimate generation time/serial interval of disease\nepicontacts\nmixdiff to estimate delay distributions and correct erroneous dates at the same time (still under development)\n\nPopulation demographics\n\nWould like to have had access to an R package similar to ColOpenData\n\nRisk factors of infection\n\nUsed R4epis as a guide on how to create two-way tables and perform Chi-squared tests\n\nSeverity of disease\n\ndatadelay for CFR calculation\nImplementation of method developed by AC Ghani, 2005 to estimate CFR\n\nContact matching\n\ndiyar to match and link records\nfuzzyjoin to join contact and case data despite misspellings or missing cell contents\n\nEpi curve and maps\n\nUsed incidence and incidence2 for incidence calculation and visualisation\nraster to extract spatial information from library of shapefiles\n\nReproduction number\n\nAPEestim\nbayEStim\nearlyR\nepicontacts\nepidemia\nepiFilter\nEpiNow2\nEpiEstim\nR0\noutbreaker2\nUsed this comparison table to choose the most appropriate package.\n\nSuperspreading, by using these resources:\n\nfitdistrplus\nepicontacts\n\nEpidemic projections\n\nincidence R estimation using a loglinear model\nprojections using Rt estimates, SI distributions and overdispersion estimates\n\nTransmission chains and strain characterisation\n\nIQtree and nextclade to build a maximum likelihood tree and mannually inspect it\nAdvanced modelling through phylodynamic methods, using tools like BEAST\n\n\n\n\n\n\n\n\n\nData analysis step\nChallenges\n\n\n\n\nData cleaning\nNot knowing what packages are available for this purpose\n\n\nDelay distributions\nDealing with right truncation\nAccounting for multiple infectors\n\n\nPopulation demographics\nLacking tools that provide information about population by age, gender, etc.\n\n\nRisk factors of infection\nDistinguishing between risk factors vs detecting differences in reporting frequencies among groups\n\n\nSeverity of disease\nKnowing the prevalence of disease (denominator)\nRight truncated data\nVarying severity of different strains\n\n\nContact matching\nMissing data\nMisspellings\n\n\nEpicurve and maps\nNA dates entries not included\nReporting levels varying over time\n\n\nOffspring distribution\nRight truncation\nTime varying reporting efforts\nAssumption of a single homogeneous epidemic\nImportation of cases\n\n\nForecasting\nUnderlying assumption of a given R distribution, e.g., single trend, homogeneous mixing, no saturation" }, { - "objectID": "posts/copilot-workspace/index.html", - "href": "posts/copilot-workspace/index.html", - "title": "Some thoughts after a trial run of GitHub’s Copilot Workspace", - "section": "", - "text": "GitHub recently previewed ‘Copilot Workspace’, which aims to use generative AI to assist software developers. Rather than just giving in-line suggestions, as GitHub copilot does, workspace allows users to map out and execute entire projects. We got early preview access to the tool, so decided to see how well it performed for our needs in Epiverse-TRACE.\nIn the Epiverse-TRACE initiative we aim to develop an ecosystem of epidemic analytics tools as Digital Public Goods in the form of R packages. We aim to follow best-practices given that our tools will be used to inform public health decisions. For efficiency, we use a range of development tools, including integrated development environments (IDE), such as RStudio and Visual Studio Code, other R packages to help check and document our code, for example {lintr} and {roxygen2}, and version control and code sharing tools, git and GitHub.\nGiven the rapid development and increasing availability to artificial intelligence (AI) models, specifically large language models, offer the potential to be another development tool to add to the arsenal of researchers and software engineers to enhance productivity and improve the reliability of the code we write and others depend upon (Hoek et al. 2024). Large language models like ChatGPT from OpenAI, Gemini from Google, Claude from Anthropic, as well as many others and new ones frequently appearing, have presented opportunities to quickly generate material – text, code and images – from text prompts.\nA small group of Epiverse-TRACE professors and research software engineers decided to explore the suitability of one such, GitHub Copilot integrated with GitHub Workspaces, for some typical use cases encountered in our day-to-day work. We do note that many other such tools are available, and we chose Copilot in Workspaces as our code is hosted on GitHub, making for easy evaluation of the specific contribution of the AI tool. However, some of our broad conclusions are based on using other tools for similar tasks.\nThe aim of this blog post is to summarise our experiences with Copilot and Workspaces, and share our thoughts more generally about how the current feature set of AI-powered development tools could help with our daily development tasks, and with research software development at large. We evaluated Copilot workspace across three tasks of varying difficulty and requirements:" + "objectID": "posts/100days-workshop/index.html#scenario-2-outbreak-of-an-unidentified-disease-in-rural-colombia", + "href": "posts/100days-workshop/index.html#scenario-2-outbreak-of-an-unidentified-disease-in-rural-colombia", + "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", + "section": "Scenario 2: Outbreak of an unidentified disease in rural Colombia", + "text": "Scenario 2: Outbreak of an unidentified disease in rural Colombia\n\n\n\nScenario 2 details\n\n\n\nAnalytic pipeline for scenario 2 (analysed by group 3)\n\nData cleaning: manually, using R (no packages specified), to\n\nFix data entry issues in columns onset_date and gender\nCheck for missing data\nCheck sequence of dates: symptom onset → hospitalisation → death\n\nData anonymisation to share with partners\n\nfastlink for probabilistic matching between cases ↔︎ contacts, based on names, dates, and ages\n\nCase demographics\n\napyramid to stratify data by age, gender, and health status\n\nReproductive number calculation, by using two approaches:\n\nManually, by calculating the number of cases generated by each source case, data management through dplyr and data.table\nUsing serial interval of disease, through EpiEstim or EpiNow2\n\nSeverity of disease\n\nManual calculation of CFR and hospitalisation ratio\n\nProjection of hospital bed requirements\n\nEpiNow2 to calculate average hospitalisation duration and forecasting\n\nZoonotic transmission of disease\n\nManual inspection of cases’ occupation\nUse of IQtree and ggtree to plot phylogenetic data\n\nSuperspreading\n\nepicontacts\n\nCalculation of attack rate\n\nUnable to calculate, given the lack of seroprevalence data\n\n\n\n\n\n\n\n\n\nData analysis step\nChallenges\n\n\n\n\nData anonymisation\nDealing with typos and missing data when generating random unique identifiers\n\n\nReproduction number\nRight truncation\nUnderestimation of cases due to reporting delays\n\n\nProjection of hospital bed requirements\nIncomplete data (missing discharge date)\nUndocumented functionality in R packages used\n\n\nZoonotic transmission\nPoor documentation\nUnavailability of packages in R\nDifferentiation between zoonotic transmission and risk factors- need for population data\n\n\nAttack rate\nNot enough information provided" }, { - "objectID": "posts/copilot-workspace/index.html#experiment-1-export-an-r-function-by-updating-the-documentation", - "href": "posts/copilot-workspace/index.html#experiment-1-export-an-r-function-by-updating-the-documentation", - "title": "Some thoughts after a trial run of GitHub’s Copilot Workspace", - "section": "Experiment 1: Export an R function by updating the documentation", - "text": "Experiment 1: Export an R function by updating the documentation\n\n\n\n\n\n\nNote\n\n\n\nDifficulty level: basic\n\n\nThe first experiment was on the {EpiNow2} R package. An existing Issue #681 was chosen. It was a simple task to evaluate Copilot workspace’s ability to interpret requests that require small changes to the codebase of a package (i.e. can be achieved in a single commit by changing less than 5 lines of code). The issue stated that an internal function, epinow2_cmdstan_model() should be exported, with the issue description explaining this would help users specify custom options for running the models.\n\n\n\nScreenshot of EpiNow2 package issue 681 which was used as an example application of GitHub Copilot Workspace.\n\n\n\n\nOutcome\nThe specification provided by Copilot workspace diagnosed the issue requirements, mentioning that the NAMESPACE file, which stores information on functions offered by a package and functions imported from other packages, would have to be updated for the R package to export this currently unexported function. The proposed solution also addressed the desired changes, stating that the function was to be exported and the keyword internal would be removed. The automatically generated plan proposed to edit the file documentation and NAMESPACE, but we edited this to more specifically replace the @keyword internal {roxygen} tag with @export.\n\n\nOur thoughts\nThe implementation offered by GitHub copilot workspace was correct and replaced @keywords internal with @export in the correct function documentation epinow2_stan_model().\nHowever, it also changed the formatting of one of the function arguments (@param model) by moving some text to the next line and added a comma to the last but one item listed in the sentence. (It seems AI has taken a side on the Oxford comma debate).\nOverall, this test case of GitHub copilot workspace was a partial success. It correctly recognised the links between inline R function documentation, Roxygen tags, and the R-specific documentation files in man/. However, it (naturally) does not have a mechanistic understanding of package development and how changing tags in documentation is used to auto-update doc files using development tools. It correctly modified the part of the file needed to export an internal function. This showcases an example of where this technology can be beneficial: those new to a language or paradigm that have not yet picked up all of the development details that are often second nature to experienced developers. In this example, the time taken to make this change in GitHub copilot workspace was likely ~10x longer than what our RSEs would take. However, if someone was used to developing in say, Python, and was new to the conventions of exporting functions in R using {roxygen} documentation, this test case shows how copilot workspace can reliably assist. On the other hand, because these changes make no reference to standard development tools, they would not help new developers learn how or where to use them.\nThe changes made will still require the user to manually run the documentation (using either devtools::document() or roxygen2::roxygenize()) to update the package NAMESPACE for the function to be exported. Our recommendation would be for AI tools to integrate with and call development tools used by developers (analogous to pre-commit hooks running tools before committing) to make sure this is taken care of all in a single generative step." + "objectID": "posts/100days-workshop/index.html#scenario-3-reston-ebolavirus-in-the-philippines", + "href": "posts/100days-workshop/index.html#scenario-3-reston-ebolavirus-in-the-philippines", + "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", + "section": "Scenario 3: Reston Ebolavirus in the Philippines", + "text": "Scenario 3: Reston Ebolavirus in the Philippines\n\n\n\nScenario 3 details\n\n\n\nAnalytic pipeline for scenario 3 (analysed by group 4)\n\nData cleaning\n\nImporting data with rio, readxl, readr, or openxlsx\nRename variables with janitor\nInitial data checks with pointblank, assertr, compareDF, or skimr\nVertical data checks with matchmaker, lubridate, or parsedate\nHorizontal data checks with hmatch, assertr, or queryR\nDetect duplicates with janitor and tidyverse\nChecking for consistency with dplyr, or powerjoin\nTranslation with matchmaker\n\nDelay distributions\n\nfitdistrplus to fit parameteric distributions to epidemic data\n\nCase demographics\n\napyramid to stratify data by age, gender, and health status\nggplot2 to visualise data\n\nOutbreak description\n\nsitrep to generate reports\n\nVisualisation of geographic data\n\nsf for static maps\nleaflet for interactive maps\n\nGeneration of tables\n\ngtsummary for static tables\njanitor for interactive tables\n\nSeverity of disease\n\nEpiNow2 and survival to calculate CFR\n\nAttack rate\n\ngadm function to get population data\nepitabulate to describe data\nsf and ggplot2 to plot data\n\nForecasting\n\nEpiEstim\nEpiNow2\nbpmodels\n\nSpillover events\n\nBy cross-referencing contact data with occupations\n\nEffectiveness of contact tracing\n\nBy calculating the proportion of case follow-ups and comparing the delay of disease exposure to the follow-up delay\n\nTransmission trees\n\nepicontacts\nggplot2\n\n\n\n\n\nData analysis step\nChallenges\n\n\n\n\nDetection of outliers\nNo known tools to use\n\n\nSeverity of disease\nCensoring\n\n\nSpillover events\nMissing data" }, { - "objectID": "posts/copilot-workspace/index.html#experiment-2-add-a-simple-epidemic-model-to-an-r-package-that-contains-more-complex-models", - "href": "posts/copilot-workspace/index.html#experiment-2-add-a-simple-epidemic-model-to-an-r-package-that-contains-more-complex-models", - "title": "Some thoughts after a trial run of GitHub’s Copilot Workspace", - "section": "Experiment 2: Add a simple epidemic model to an R package that contains more complex models", - "text": "Experiment 2: Add a simple epidemic model to an R package that contains more complex models\n\n\n\n\n\n\nNote\n\n\n\nDifficulty level: moderate to high\n\n\n{epidemics} is an Epiverse-TRACE R package that exports a range of compartmental epidemiological models of varying complexities and applications.\nIn the second test case, we created an issue in the {epidemics} R package requesting a basic SIR model, with the issue description requesting the model to have no age structure and to use existing functionality in the package to specify a disease intervention. Additionally, the issue requested that the model be solved with the {deSolve} R package, which is a differential equation solver in R. A basic SIR model would have been new to the package, but one of several other compartmental models, and simpler than the models already implemented.\n\n\n\nScreenshot of epidemics package issue 238 which was used as an example application of GitHub Copilot Workspace.\n\n\n\nOutcome\nGitHub copilot workspace outlined the existing functionality in the package and proposed changes including adding a new file called R/model_sir.R, which follows the model_*.R naming convention used in {epidemics}. As requested, copilot also proposed that the new SIR model would support the use of interventions that are already been implemented in the package.\n\n\nOur thoughts\nThe code generated was an R function that constructed a basic SIR model, and this was well documented using a {roxygen2} documentation block. However, several aspects of the code generated did not match the proposal generated by copilot workspace, and the code contained inadvisable coding practices in R (i.e. code smells). Firstly the SIR model itself. The model implemented follows the standard set of differential equations that are solved using {deSolve}, as requested in our prompt. However, the model does not have any options to input interventions, which the proposed solution by GitHub copilot workspace suggested it would. The second downside is the use of require(deSolve) in the function body of the generated code. This is bad practice in R package development as functions should namespace any functions that are being called from other packages (e.g. packagename::func_name()).\nThis task required copilot workspace to understand the contents and conventions of the {epidemics} R package and generate a simple compartmental model which would integrate with the infrastructure in the package for defining interventions in models. Although it correctly summarised the package contents, its attempt to answer the specific request from our prompt had many inaccuracies. The generated code is likely influenced by the many examples of SIR models written in R on the internet and on GitHub. These might be R scripts or gists and thus coding practises used there are not always appropriate for writing code to be distributed in an R package, resulting in decisions like having require(deSolve) in the function.\nAI-generated code like that seen in this example showcases where users need to be diligent and not assume that the text descriptions from copilot match the code. In the end, this was a test example and was unlikely to be merged into {epidemics} even if the code was correct. However, with the code generated, it presents almost no use; it would require as much time for an RSE to refactor this function as it would to write one from scratch following the design principles and demands of the {epidemics} package." + "objectID": "posts/100days-workshop/index.html#scenario-4-emerging-avian-influenza-in-cambodia", + "href": "posts/100days-workshop/index.html#scenario-4-emerging-avian-influenza-in-cambodia", + "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", + "section": "Scenario 4: Emerging avian influenza in Cambodia", + "text": "Scenario 4: Emerging avian influenza in Cambodia\n\n\n\nScenario 4 details\n\n\n\nAnalytic pipeline for scenario 4 (analysed by group 5)\n\nData cleaning\n\ntidyverse\nreadxl to import data\ndplyr to remove names\nlubridate to standardise date formats\nManually scanning through excel to check for errors\n\nReproduction number\n\nEpiEstim\n\nSeverity of disease\n\nManually using R to detect missing cases\nepiR to check for data censoring\n\n\n\n\n\n\n\n\n\nData analysis step\nChallenges\n\n\n\n\nData cleaning\nNo available R packages specific for epidemic data\n\n\nReproduction number\nDifficulty finding parameter estimations in the literature\n\n\nSerial interval\nLack of a tool to check for parameter estimates\n\n\nSeverity\nMissing cases\nNeed for an R package for systematic censoring analysis" }, { - "objectID": "posts/copilot-workspace/index.html#experiment-3-conducting-a-code-review-on-an-r-package", - "href": "posts/copilot-workspace/index.html#experiment-3-conducting-a-code-review-on-an-r-package", - "title": "Some thoughts after a trial run of GitHub’s Copilot Workspace", - "section": "Experiment 3: Conducting a code review on an R package", - "text": "Experiment 3: Conducting a code review on an R package\n\n\n\n\n\n\nNote\n\n\n\nDifficulty level: moderate to high\n\n\nIn the last test case, we requested GitHub copilot workspace to conduct a code review of the entire code base of a package. At Epiverse-TRACE, it is standard practise to conduct a full package review. As any code base develops, including R packages, more commits are added and often the number of lines of code increases. Reviewing all of these changes before releasing a new version of the package is a timely undertaking for package reviewers. If AI tools could review the code base, similar to how static code analysers work, and suggest improvements to, for example, code style, memory efficiency, documentation, unit testing, UX/UI, it would greatly improve efficiency before releasing the code.\n\nOutcome\nHere, Copilot provided no analysis of the code base. It regurgitated the changes described in the package review pull request, likely from a mix of the pull request description and the changelog (NEWS.md file), and did not analyse any of the code. Therefore, it provided no additional use than if the reviewer had just read the NEWS." + "objectID": "posts/100days-workshop/index.html#scenario-5-outbreak-of-respiratory-disease-in-canada", + "href": "posts/100days-workshop/index.html#scenario-5-outbreak-of-respiratory-disease-in-canada", + "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", + "section": "Scenario 5: Outbreak of respiratory disease in Canada", + "text": "Scenario 5: Outbreak of respiratory disease in Canada\n\n\n\nScenario 5 details\n\n\n\nAnalytic pipeline for scenario 5 (analysed by group 1)\n\nDefine project structure\n\nDefining the script’s structure with cookiecutter, reportfactory, and orderly\nEnsuring reproducibility of the analysis with iRODS and Git\nWorking in a group with GitHub\n\nData cleaning\n\nImporting data with readr or rio\nChecking for errors with linelist, janitor, parsedate, matchmaker, or lubridate\njanitor to eliminate duplicates\nnaniar to check for missing data\nepitrix to anonymise data\n\nDelay distributions\n\nepitrix\nfitdistrplus to fit parameteric distributions to scenario data\n\nCase demographics\n\napyramid to stratify data by age, gender, and health status\n\nNowcasting\n\nincidence2 to visualise incidence from linelist data\nepiparameter to extract infectious disease parameter data\nEpiEstim or EpiNow2 for Rt calculation\n\nSeverity of disease\n\nCalculation of hospitalisation and mortality rates- no R package specified\n\nZoonotic transmission\n\nforecast\n\nGeneration of reports\n\nincidence for static reports\nQuarto and R markdown for dashboards\n\n\n\n\n\n\n\n\n\nData analysis step\nChallenges\n\n\n\n\nProject structure\nWorking simultaneously on the same script and managing parallel tasks\nAnticipating future incoming data in early pipeline design\n\n\nData cleaning\nLarge amount of code lines used on (reasonably) predictable cleaning (e.g. data sense checks)\nOmitting too many data entries when simply removing NA rows\nNon standardised data formats\nImplementing rapid quality check reports before analysis\n\n\nDelay distributions\nIdentifying the best method to calculate, or compare functionality of tools\nNeed to fit multiple parametric distributions and return best, and store as usable objects\n\n\nSeverity of disease\nCensoring and truncation\nUnderestimation of mild cases\nNeed database of age/gender pyramids for comparisons\n\n\nForecasts\nNeed option for fitting with range of plausible pathogen serial intervals and comparing results\nChanging reporting delays over time\nMatching inputs/outputs between packages\n\n\nZoonotic transmisison\nNeed for specific packages with clear documentation\nHow to compare simple trend-based forecasts" }, { - "objectID": "posts/copilot-workspace/index.html#conclusion", - "href": "posts/copilot-workspace/index.html#conclusion", - "title": "Some thoughts after a trial run of GitHub’s Copilot Workspace", - "section": "Conclusion", - "text": "Conclusion\nBased on our trials we observe that GitHub copilot Workspace may not provide a significant benefit to research software engineers in domain-specific R package development, where problems are complex and bespoke solutions are common. The evidence above suggests there’s still a long way to go before AI-powered tools can produce unsupervised code projects equivalent to that of a research software engineer. One observation is that generative AI is better trained for some purposes than others, biased by the amount of training data/information available. This makes it struggle in niche areas with a relatively smaller public code base to train models with (e.g. R functions for epidemic analysis are much rarer than generic JavaScript functions for web design). A second is that AI models don’t seem to understand how different parts of a codebase link together, so they provide solutions that are inconsistent with the requirements of the codebase (design principles, code style, etc). These models may, however, be more useful for more common tasks, such as building dashboard templates.\nThe test cases we experimented with suggest that this tool will not replace, and at times not even enhance development when the task requires a more integrated development than generating a standalone script. These tools are evolving rapidly and we are sure improvements will continue. In the short term tools like GitHub copilot workspace need to be used responsibly with an awareness of their limitations and not a blind trust that the code it writes is always correct. We would caution against relying on generative AI tools alone to develop software with potentially large-scale impacts - scrutiny by trained developers is still key. Research software engineers will play a key part in this as they will ensure that code and software written with and by AI tools meet required standards, are trustworthy, and are not potentially harmful. Where we see generative AI tools being more immediately useful in research software development is in tasks that require sifting through or summarising vast amounts of information; for example, in querying software documentation, or to help get started with a new programming framework or language.\nAll authors thank Hugo Gruson and Chris Hartgerink for their valuable feedback and contributions to this post." + "objectID": "posts/100days-workshop/index.html#what-next", + "href": "posts/100days-workshop/index.html#what-next", + "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", + "section": "What next?", + "text": "What next?\nScenarios developed by the 100 days workshop participants illustrate that there are many commonalities across proposed analytics pipelines, which could support interoperability across different epidemiological questions. However, there are also several remaining gaps and challenges, which creates an opportunity to build on existing work to tackle common outbreak scenarios, using the issues here as a starting point. This will also require consideration of wider interactions with existing software ecosystems and users of outbreak analytics insights. We are therefore planning to follow up this vignette with a more detailed perspective article discussing potential for broader progress in developing a ‘first 100 lines of code’." }, { - "objectID": "posts/benchmarking_design_decisions/index.html", - "href": "posts/benchmarking_design_decisions/index.html", - "title": "Using benchmarking to guide the adoption of dependencies in R packages", - "section": "", - "text": "library(bench)\nlibrary(cli)\nlibrary(dplyr)\nlibrary(ggplot2)\nR package developers often have to take on dependencies for various reasons. Dependencies are external packages that provide extra functionalities or features to another package (Wickham and Bryan 2023). Quite often, the goal of taking on a dependency is to utilize its rich features and reduce duplicated coding effort. For example, the ggplot2 package is often adopted, instead of R’s inbuilt plotting functions, to make visualizations using its declarative system and implementation of the Grammar of Graphics, and slick automation across the board.\nSometimes, introducing a dependency might slow down existing processes. Hence, it is important to consider the speed of a dependency’s functions before adopting it. Code optimisation is often reserved for heavy processes like data processing steps. However, here, we will focus on lesser optimised processes, and in particular, printing of texts in R.\nFor this demonstration, we will look at a situation where a package developer is considering options for condition signalling beyond the functionalities provided in base R. Condition signalling refers to the process of communicating to a user that an issue has arisen during code execution (Wickham 2019).\nBase R ships with functionalities for condition signalling. There is base::stop() for signalling errors. Errors refer to cases where something fatal happens during code execution so that it halts, for example, when a function is run without specifying the required arguments.\nrnorm()\n\nError in rnorm(): argument \"n\" is missing, with no default\nBase R provides base::warning() for throwing warnings. Warnings are used to handle less fatal issues with code execution that do not need to halt the process. For example, when you compare two vectors of different lengths, R will throw a warning but will still return a result.\nc(1:2) > c(1:3)\n\nWarning in c(1:2) > c(1:3): longer object length is not a multiple of shorter\nobject length\n\n\n[1] FALSE FALSE FALSE\nLastly, there is base::message() for throwing messages. Messages are used to provide useful information about processes. For example, packages throw messages about NAMESPACE conflicts when loaded. This is to inform the user so they know what next steps to take.\nlibrary(data.table)\n\n\nAttaching package: 'data.table'\n\n\nThe following objects are masked from 'package:dplyr':\n\n between, first, last\nThese base R functions are, however, not flexible enough for advanced formatting: colour, progress bar, contextual pluralisation, and so forth. The package developer might want to adopt external dependencies for these flexibilities. The cli package is well designed for this. cli offers many of the advanced formatting features. Here is a quick example (adopted from the help file) using cli::cli_abort(), the equivalent of base::stop().\nn <- \"c(1, 2, 3)\"\ncli_abort(\n c(\n \"{.var n} must be a numeric vector\",\n \"x\" = \"You've supplied a {.cls {class(n)}} vector.\"\n )\n)\n\nError:\n! `n` must be a numeric vector\n✖ You've supplied a <character> vector.\nIn this example, we used inline text formatting to print n as a variable and interpolated code to print the class of n in the error message and all in just 6 lines of code.\nIf this is enough to excite your interests, check out cli’s website and help files, which provide in-depth guides on its myriad features.\nNow, considering all these amazing features that cli offers, what should a package developer consider to adopt it? In this article, we will demonstrate how to use (micro)benchmarking1 to guide the decision through the lense of speed. Benchmarking can be used to check whether adopting cli as a dependency would slow down existing printing processes in undesirable ways.\nHere, we will benchmark cli::cli_warn() against base R’s base::warning() to see if the former has any speed issues. The results of this exercise will provide us with insights on whether its worth taking on cli for its features as a trade-off for slower printing speed." + "objectID": "posts/100days-workshop/index.html#list-of-contributors", + "href": "posts/100days-workshop/index.html#list-of-contributors", + "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", + "section": "List of contributors", + "text": "List of contributors\n\nGroup 1: Rich Fitzjohn, Mauricio Santos Vega, Andrea Torneri, Abdoelnaser Degoot, Rolina van Gaalen, Zulma Cucunuba, Joseph Tsui, Claudine Lim, Adam Kucharski.\nGroup 2: Juan Daniel Umaña, Joel Hellewell, Anne Cori, Fanck Kalala, Amrish Baidjoe, Sara Hollis, Chaoran Chen, Pratik Gupte, Andree Valle.\nGroup 3: Mutono Nyamai, Finlay Campbell, Arminder Deol, Simone Carter, Anita Shah, Neale Batra, Issa Karambal, Danil Mihailov, Sebastian Funk.\nGroup 4: Anton Camacho, Louise Dyson, Jeremy Bingham, Simon Cauchemez, Alex Spina, Esther Van Kleef, Anna Carnegie, James Azam.\nGroup 5: Olivia Keiser, Geraldine Gomez, John Lees, Don Klinkenberg, Matthew Biggerstaff, David Santiago Quevedo, Joshua Lambert, Carmen Tamayo." }, { - "objectID": "posts/benchmarking_design_decisions/index.html#experiment", - "href": "posts/benchmarking_design_decisions/index.html#experiment", - "title": "Using benchmarking to guide the adoption of dependencies in R packages", - "section": "Experiment", - "text": "Experiment\nWe will consider different scenarios of number of warnings thrown to tease out the relationship between speed of the function and number of warnings thrown in a function with and without cli::cli_warn(). We will also check how the benchmark scales with the number of warnings in the function.\nBased on this description, we will need the following packages: cli for its warning function, and bench for measuring the run times, dplyr for manipulating the results, and ggplot2 for plotting.\nLet’s define a function that takes an argument n for the number of warnings to throw and pkg for the type of package to use.\n\nwarn_Ntimes <- function(n, pkg) {\n warning_msg <- \"x must be an integer\"\n switch(pkg,\n base = for (i in seq_len(n)) {\n warning(warning_msg)\n },\n cli = for (i in seq_len(n)) {\n cli_warn(warning_msg)\n }\n )\n}\n\nLet’s test our function to see if it works as expected.\n\nwarn_Ntimes(3, \"base\")\n\nWarning in warn_Ntimes(3, \"base\"): x must be an integer\n\nWarning in warn_Ntimes(3, \"base\"): x must be an integer\n\nWarning in warn_Ntimes(3, \"base\"): x must be an integer\n\nwarn_Ntimes(3, \"cli\")\n\nWarning: x must be an integer\n\n\nWarning: x must be an integer\nx must be an integer\n\n\nNow, we’ll consider scenarios where a function throws 1, 5, 10, 15, 20, and 100 warnings using base R and cli.\n\n# Number of warnings to throw\nwarnings <- c(1, 5, 10, 15, 20, 100)\n\nLet’s run benchmarks over the different scenarios and store results in a data.frame.\n\nbnmark_res <- press(\n warnings = c(1, 5, 10, 15, 20, 100),\n mark(\n cli = warn_Ntimes(warnings, \"cli\"),\n base = warn_Ntimes(warnings, \"base\")\n )\n )" + "objectID": "posts/extend-dataframes/index.html", + "href": "posts/extend-dataframes/index.html", + "title": "Extending Data Frames", + "section": "", + "text": "R is a commonly used language for data science and statistical computing. Foundational to this is having data structures that allow manipulation of data with minimal effort and cognitive load. One of the most commonly required data structures is tabular data. This can be represented in R in a few ways, for example a matrix or a data frame. The data frame (class data.frame) is a flexible tabular data structure, as it can hold different data types (e.g. numbers, character strings, etc.) across different columns. This is in contrast to matrices – which are arrays with dimensions – and thus can only hold a single data type.\n\n# data frame can hold heterogeneous data types across different columns\ndata.frame(a = c(1, 2, 3), b = c(4, 5, 6), c = c(\"a\", \"b\", \"c\"))\n\n a b c\n1 1 4 a\n2 2 5 b\n3 3 6 c\n\n# each column must be of the same type\ndf <- data.frame(a = c(1, 2, 3), b = c(\"4\", 5, 6))\n# be careful of the silent type conversion\ndf$a\n\n[1] 1 2 3\n\ndf$b\n\n[1] \"4\" \"5\" \"6\"\n\nmat <- matrix(1:9, nrow = 3, ncol = 3)\nmat\n\n [,1] [,2] [,3]\n[1,] 1 4 7\n[2,] 2 5 8\n[3,] 3 6 9\n\nmat[1, 1] <- \"1\"\n# be careful of the silent type conversion\nmat\n\n [,1] [,2] [,3]\n[1,] \"1\" \"4\" \"7\" \n[2,] \"2\" \"5\" \"8\" \n[3,] \"3\" \"6\" \"9\" \n\n\nData frames can even be nested, cells can be data frames or lists.\n\ndf <- data.frame(a = \"w\", b = \"x\")\ndf[1, 1][[1]] <- list(c = c(\"y\", \"z\"))\ndf\n\n a b\n1 y, z x\n\ndf <- data.frame(a = \"w\", b = \"x\")\ndf[1, 1][[1]] <- list(data.frame(c = \"y\", d = \"z\"))\ndf\n\n a b\n1 y, z x\n\n\nIt is therefore clear why data frames are so prevalent. However, they are not without limitations. They have a relatively basic printing method which can fload the R console when the number of columns or rows is large. They have useful methods (e.g., summary() and str()), but these might not be appropriate for certain types of tabular data. In these cases it is useful to utilise R’s inheritance mechanisms (specifically S3 inheritance) to write extensions for R’s data.frame class. In this case the data frame is the superclass and the new subclass extends it and inherits its methods (see the Advanced R book for more details on S3 inheritance).\nOne of the most common extension of the data frame is the tibble from the {tibble} R package. Outlined in {tibble}’s vignette, tibbles offer improvements in printing, subsetting and recycling rules. Another commonly used data frame extension is the data.table class from the {data.table} R package. In addition to the improved printing, this class is designed to improve the performance (i.e. speed and efficiency of operations and storage) of working with tabular data in R and provide a terse syntax for manipulation.\nIn the process of developing R software (most likely an R package), a new tabular data class that builds atop data frames can become beneficial. This blog post has two main sections:\n\na brief overview of the steps required to setup a class that extends data frames\nguide to the technical aspects of class invariants (required data members of a class) and design and implementation decisions, and tidyverse compatibility\n\n\n\nIt is useful to write a class constructor function that can be called to create an object of your new class. The functions defined below are a redacted version (for readability) of functions available in the {ExtendDataFrames} R package, which contains example functions and files discussed in this post. When assigning the class name ensure that it is a vector containing \"data.frame\" as the last element to correctly inherit properties and methods from the data.frame class.\nbirthdays <- function(x) {\n # the vector of classes is required for it to inherit from `data.frame`\n structure(x, class = c(\"birthdays\", \"data.frame\"))\n}\nThat’s all that’s needed to create a subclass of a data frame. However, although we’ve created the class we haven’t given it any functionality and thus it will be identical to a data frame due to inheritance.\nWe can now write as many methods as we want. Here we will show two methods, one of which does not require writing a generic (print.birthdays) and the second that does (birthdays_per_month). The print() generic function is provided by R, which is why we do not need to add one ourselves. See Adv R and this Epiverse blog post to find out more about S3 generics.\nprint.birthdays <- function(x, ...) {\n cat(\n sprintf(\n \"A `birthdays` object with %s rows and %s cols\",\n dim(x)[1], dim(x)[2]\n )\n )\n invisible(x)\n}\n\nbirthdays_per_month <- function(x, ...) {\n UseMethod(\"birthdays_per_month\")\n}\n\nbirthdays_per_month.birthdays <- function(x, ...) {\n out <- table(lubridate::month(x$birthday))\n months <- c(\n \"Jan\", \"Feb\", \"Mar\", \"Apr\", \"May\", \"Jun\",\n \"Jul\", \"Aug\", \"Sep\", \"Oct\", \"Nov\", \"Dec\"\n )\n names(out) <- months[as.numeric(names(out))]\n return(out)\n}\n\n\n\n\n\n\nTip\n\n\n\nUseful resources for the “Writing custom data class” section: extending tibbles and their functionality\n\n\n\n\n\nWe will now move on to the second section of the post, in which we discuss the design choices when creating and using S3 classes in R. Class invariants are members of your class that define it. In other words, without these elements your class does not fulfil its basic definition. It is therefore sensible to make sure that your class contains these elements at all times (or at least after operations have been applied to your class). In cases when the class object contains all the invariants normal service can be continued. However, in the case that an invariant is missing or modified to a non-conformist type (e.g. a date converted to a numeric) a decision has to be made. Either the code can error, hopefully giving the user an informative message as to why their modification broke the object; alternatively, the subclass can be revoked and the superclass can be returned. In almost all cases the superclass (i.e. the base class being inherited from) is more general and won’t have the same class invariant restrictions.\nFor our example class, <birthdays>, the invariants are a column called name which must contain characters, and a column called birthday which must contain dates. The order of the rows and columns is not considered an invariant property, and having extra columns with other names and data types is also allowed. The number of rows is also not an invariant as we can have as many birthdays as we like in the data object.\nHere we present both cases as well as considerations and technical details of both options. We’ll demonstrate both of these cases with the subset function in R (subsetting uses a single square bracket for tabular data, [). First the fail-on-subsetting. Before we write the subsetting function it is useful to have a function that checks that an object of our class is valid, a so-called validator function.\n\nvalidate_birthdays <- function(x) {\n stopifnot(\n \"input must contain 'name' and 'birthday' columns\" =\n all(c(\"name\", \"birthday\") %in% colnames(x)),\n \"names must be a character\" =\n is.character(x$name),\n \"birthday must be a date\" =\n lubridate::is.Date(x$birthday)\n )\n invisible(x)\n}\n\nThis will return an error if the class is not valid (defined in terms of the class’ invariants).\nNow we can show how to error if one of the invariants are removed during subsetting. See ?NextMethod() for information on method dispatch.\n\n`[.birthdays` <- function(x) {\n validate_birthdays(NextMethod())\n}\n\nbirthdays[, -1]\n# Error in validate_birthdays(NextMethod()) :\n# input must contain 'name' and 'birthday' columns\n\nThe second design option is the reconstruct-on-subsetting. This checks whether the class is valid, and if not downgrade the class to the superclass, in our case a data frame. This is done by not only validating the object during subsetting but to check whether it is a valid class object, and then either ensuring all of the attributes of the subclass – in our case <birthdays> – are maintained, or attributes are stripped and only the attributes of the base superclass – in our case data.frame – are kept.\n\n\n\n\n\n\nNote\n\n\n\nImportant note: this section of the post relies heavily on https://github.com/DavisVaughan/2020-06-01_dplyr-vctrs-compat.\n\n\nThe four functions that are required to be added to ensure our class is correctly handled when invaliding it are:\n\nbirthdays_reconstruct()\nbirthdays_can_reconstruct()\ndf_reconstruct()\ndplyr_reconstruct.birthdays()\n\nWe’ll tackle the first three first, and then move onto to the last one as this requires some extra steps.\nbirthdays_reconstruct() is a function that contains an if-else statement to determine whether the returned object is a <birthdays> or data.frame object.\n\nbirthdays_reconstruct <- function(x, to) {\n if (birthdays_can_reconstruct(x)) {\n df_reconstruct(x, to)\n } else {\n x <- as.data.frame(x)\n message(\"Removing crucial column in `<birthdays>` returning `<data.frame>`\")\n x\n }\n}\n\nThe if-else evaluation is controlled by birthdays_can_reconstruct(). This function determines whether after subsetting the object is a valid <birthdays> class. It checks whether the validator fails, in which case it returns FALSE, otherwise the function will return TRUE.\n\nbirthdays_can_reconstruct <- function(x) {\n # check whether input is valid\n valid <- tryCatch(\n { validate_birthdays(x) },\n error = function(cnd) FALSE\n )\n\n # return boolean\n !isFALSE(valid)\n}\n\nThe next function required is df_reconstruct(). This is called when the object is judged to be a valid <birthdays> object and simply copies the attributes over from the <birthdays> class to the object being subset.\n\ndf_reconstruct <- function(x, to) {\n attrs <- attributes(to)\n attrs$names <- names(x)\n attrs$row.names <- .row_names_info(x, type = 0L)\n attributes(x) <- attrs\n x\n}\n\nThe three functions defined for reconstruction can be added to a package with the subsetting function in order to subset <birthdays> objects and returning either <birthdays> objects if still valid, or data frames when invalidated. This design has the benefit that when conducting data exploration a user is not faced with an error, but can continue with a data frame, while being informed by the message printed to console in birthdays_reconstruct().\n\n`[.birthdays` <- function(x, ...) {\n out <- NextMethod()\n birthdays_reconstruct(out, x)\n}\n\n\n\n\n\nlibrary(dplyr)\n\nIn order to be able to operate on our <birthdays> class using functions from the package {dplyr}, as would be common for data frames, we need to make our function compatible. This is where the function dplyr_reconstruct.birthdays() comes in. dplyr_reconstruct() is a generic function exported by {dplyr}. It is called in {dplyr} verbs to make sure that the objects are restored to the input class when not invalidated.\n\ndplyr_reconstruct.birthdays <- function(data, template) {\n birthdays_reconstruct(data, template)\n}\n\nInformation about the generic can be found through the {dplyr} help documentation.\n\n?dplyr::dplyr_extending\n?dplyr::dplyr_reconstruct\n\nAs explained in the help documentation, {dplyr} also uses two base R functions to perform data manipulation. names<- (i.e the names setter function) and [ the one-dimensional subsetting function. We therefore define these methods for our custom class in order for dplyr_reconstruct() to work as intended.\n\n`[.birthdays` <- function(x, ...) {\n out <- NextMethod()\n birthdays_reconstruct(out, x)\n}\n\n`names<-.birthdays` <- function(x, value) {\n out <- NextMethod()\n birthdays_reconstruct(out, x)\n}\n\nThis wraps up the need for adding function to perform data manipulation using the reconstruction design outlined above.\nHowever, there is some final housekeeping to do. In cases when {dplyr} is not a package dependency (either imported or suggested), then the S3 generic dplyr_reconstruct() is required to be loaded. In R versions before 3.6.0 – this also works for R versions later than 3.6.0 – the generic function needs to be registered. This is done by writing an .onLoad() function, typically in a file called zzz.R. This is included in the {ExtendDataFrames} package for illustrative purposes.\n\n\n\nzzz.R\n\n.onLoad <- function(libname, pkgname) {\n s3_register(\"dplyr::dplyr_reconstruct\", \"birthdays\")\n invisible()\n}\n\nThe s3_register() function used in .onLoad() also needs to be added to the package and this function is kindly supplied by both {vctrs} and {rlang} unlicensed and thus can be copied into another package. See the R packages book for information about .onLoad() and attaching and loading in general.\nSince R version 3.6.0 this S3 generic registration happens automatically with S3Method() in the package namespace using the {roxygen2} documentation #' @exportS3Method dplyr::dplyr_reconstruct.\nThere is one last option which prevents the hard dependency on a relatively recent R version. Since {roxygen2} version 6.1.0, there is the @rawNamespace tag which allows insertion of text into the NAMESPACE file. Using this tag the following code will check the local R version and register the S3 method if equal to or above 3.6.0.\n\n#' @rawNamespace if (getRversion() >= \"3.6.0\") {\n#' S3method(pkg::fun, class)\n#' }\n\nEach of the three options for registering S3 methods has different benefits and downsides, so the choice depends on the specific use-case. Over time it may be best to use the most up-to-date methods as packages are usually only maintained for a handful of recent R releases1.\nThe topics discussed in this post have been implemented in the {epiparameter} R package within Epiverse-TRACE.\nCompatibility with {vctrs} is also possible using the same mechanism (functions) described in this post, and if interested see https://github.com/DavisVaughan/2020-06-01_dplyr-vctrs-compat for details.\nFor other use-cases and discussions of the designs and implementations discussed in this post see:\n\n{dials} R package\n{rsample} R package\n{googledrive} R package\nPull request on {tibble} R package\n\nThis blog post is a compendium of information from sources that are linked and cited throughout. Please refer to those sites for more information and as the primary source for citation in further work." }, { - "objectID": "posts/benchmarking_design_decisions/index.html#results", - "href": "posts/benchmarking_design_decisions/index.html#results", - "title": "Using benchmarking to guide the adoption of dependencies in R packages", - "section": "Results", - "text": "Results\nIt’s time to explore the results of the data generated. Let’s first make a boxplot of the run times for the different scenarios.\nDrum roll please…\n\nautoplot(\n bnmark_res,\n type = \"boxplot\"\n ) +\n labs(\n x = \"Package\",\n caption = \"Functions throwing various numbers of warnings as indicated in the facet label\"\n )\n\nLoading required namespace: tidyr\n\n\n\n\n\n\n\n\n\nNow, let’s see how the benchmarks scale with the number of warnings thrown.\n\nbnmark_res$pkg <- attr(bnmark_res$expression, \"description\")\n\nggplot(bnmark_res) +\n geom_line(\n aes(\n x = warnings,\n y = median,\n color = pkg\n ),\n linewidth = 2\n ) +\n labs(\n x = \"Number of warnigns\",\n y = \"Median execution time\",\n color = \"Package\"\n )\n\n\n\n\n\n\n\n\nAs we can see, cli is consistently slower than base R for the same number of warnings thrown. The median times also follow a similar non-linear trajectory. Benchmarking experiments with other cli functions have revealed similar non-linear relationships between its speed and the number of times it invokes those functions 2. However, those exercises were not compared with equivalent functions in base R. The relative comparison here is useful for decision-making.\nThe developers of cli have also conducted benchmarks of the ansi_*() family of functions in cli in comparison to base R and the fansi package. They find that cli is consistently slower than base R, which corroborates the results of our experiment here. These benchmarks are available in the cli documentation 3.\nSo, should we be worried about the speed of cli? Well, it depends on the context. The “R Packages” book by Hadley Wickham and Jenny Bryan suggests approaching such a decision from a holistic, balanced, and quantitative approach(Wickham and Bryan 2023). We’ll leave the reader to make their own decision based on their use case.\nMost developers might argue that this is an optimisation overkill4. However, it is important to consider speed differences in context. In the case of simple printing, the speed difference is negligible yet disruptive and somewhat painful. However, in the grand scheme of things, this might be nothing compared with much slower processes that need more attention. In those cases, the developer might want to consider other optimisation strategies such as profiling 5. The essence of this experiment is to demonstrate the utility of benchmarking in making quick design decisions." + "objectID": "posts/extend-dataframes/index.html#extending-data-frames-in-r", + "href": "posts/extend-dataframes/index.html#extending-data-frames-in-r", + "title": "Extending Data Frames", + "section": "", + "text": "R is a commonly used language for data science and statistical computing. Foundational to this is having data structures that allow manipulation of data with minimal effort and cognitive load. One of the most commonly required data structures is tabular data. This can be represented in R in a few ways, for example a matrix or a data frame. The data frame (class data.frame) is a flexible tabular data structure, as it can hold different data types (e.g. numbers, character strings, etc.) across different columns. This is in contrast to matrices – which are arrays with dimensions – and thus can only hold a single data type.\n\n# data frame can hold heterogeneous data types across different columns\ndata.frame(a = c(1, 2, 3), b = c(4, 5, 6), c = c(\"a\", \"b\", \"c\"))\n\n a b c\n1 1 4 a\n2 2 5 b\n3 3 6 c\n\n# each column must be of the same type\ndf <- data.frame(a = c(1, 2, 3), b = c(\"4\", 5, 6))\n# be careful of the silent type conversion\ndf$a\n\n[1] 1 2 3\n\ndf$b\n\n[1] \"4\" \"5\" \"6\"\n\nmat <- matrix(1:9, nrow = 3, ncol = 3)\nmat\n\n [,1] [,2] [,3]\n[1,] 1 4 7\n[2,] 2 5 8\n[3,] 3 6 9\n\nmat[1, 1] <- \"1\"\n# be careful of the silent type conversion\nmat\n\n [,1] [,2] [,3]\n[1,] \"1\" \"4\" \"7\" \n[2,] \"2\" \"5\" \"8\" \n[3,] \"3\" \"6\" \"9\" \n\n\nData frames can even be nested, cells can be data frames or lists.\n\ndf <- data.frame(a = \"w\", b = \"x\")\ndf[1, 1][[1]] <- list(c = c(\"y\", \"z\"))\ndf\n\n a b\n1 y, z x\n\ndf <- data.frame(a = \"w\", b = \"x\")\ndf[1, 1][[1]] <- list(data.frame(c = \"y\", d = \"z\"))\ndf\n\n a b\n1 y, z x\n\n\nIt is therefore clear why data frames are so prevalent. However, they are not without limitations. They have a relatively basic printing method which can fload the R console when the number of columns or rows is large. They have useful methods (e.g., summary() and str()), but these might not be appropriate for certain types of tabular data. In these cases it is useful to utilise R’s inheritance mechanisms (specifically S3 inheritance) to write extensions for R’s data.frame class. In this case the data frame is the superclass and the new subclass extends it and inherits its methods (see the Advanced R book for more details on S3 inheritance).\nOne of the most common extension of the data frame is the tibble from the {tibble} R package. Outlined in {tibble}’s vignette, tibbles offer improvements in printing, subsetting and recycling rules. Another commonly used data frame extension is the data.table class from the {data.table} R package. In addition to the improved printing, this class is designed to improve the performance (i.e. speed and efficiency of operations and storage) of working with tabular data in R and provide a terse syntax for manipulation.\nIn the process of developing R software (most likely an R package), a new tabular data class that builds atop data frames can become beneficial. This blog post has two main sections:\n\na brief overview of the steps required to setup a class that extends data frames\nguide to the technical aspects of class invariants (required data members of a class) and design and implementation decisions, and tidyverse compatibility\n\n\n\nIt is useful to write a class constructor function that can be called to create an object of your new class. The functions defined below are a redacted version (for readability) of functions available in the {ExtendDataFrames} R package, which contains example functions and files discussed in this post. When assigning the class name ensure that it is a vector containing \"data.frame\" as the last element to correctly inherit properties and methods from the data.frame class.\nbirthdays <- function(x) {\n # the vector of classes is required for it to inherit from `data.frame`\n structure(x, class = c(\"birthdays\", \"data.frame\"))\n}\nThat’s all that’s needed to create a subclass of a data frame. However, although we’ve created the class we haven’t given it any functionality and thus it will be identical to a data frame due to inheritance.\nWe can now write as many methods as we want. Here we will show two methods, one of which does not require writing a generic (print.birthdays) and the second that does (birthdays_per_month). The print() generic function is provided by R, which is why we do not need to add one ourselves. See Adv R and this Epiverse blog post to find out more about S3 generics.\nprint.birthdays <- function(x, ...) {\n cat(\n sprintf(\n \"A `birthdays` object with %s rows and %s cols\",\n dim(x)[1], dim(x)[2]\n )\n )\n invisible(x)\n}\n\nbirthdays_per_month <- function(x, ...) {\n UseMethod(\"birthdays_per_month\")\n}\n\nbirthdays_per_month.birthdays <- function(x, ...) {\n out <- table(lubridate::month(x$birthday))\n months <- c(\n \"Jan\", \"Feb\", \"Mar\", \"Apr\", \"May\", \"Jun\",\n \"Jul\", \"Aug\", \"Sep\", \"Oct\", \"Nov\", \"Dec\"\n )\n names(out) <- months[as.numeric(names(out))]\n return(out)\n}\n\n\n\n\n\n\nTip\n\n\n\nUseful resources for the “Writing custom data class” section: extending tibbles and their functionality\n\n\n\n\n\nWe will now move on to the second section of the post, in which we discuss the design choices when creating and using S3 classes in R. Class invariants are members of your class that define it. In other words, without these elements your class does not fulfil its basic definition. It is therefore sensible to make sure that your class contains these elements at all times (or at least after operations have been applied to your class). In cases when the class object contains all the invariants normal service can be continued. However, in the case that an invariant is missing or modified to a non-conformist type (e.g. a date converted to a numeric) a decision has to be made. Either the code can error, hopefully giving the user an informative message as to why their modification broke the object; alternatively, the subclass can be revoked and the superclass can be returned. In almost all cases the superclass (i.e. the base class being inherited from) is more general and won’t have the same class invariant restrictions.\nFor our example class, <birthdays>, the invariants are a column called name which must contain characters, and a column called birthday which must contain dates. The order of the rows and columns is not considered an invariant property, and having extra columns with other names and data types is also allowed. The number of rows is also not an invariant as we can have as many birthdays as we like in the data object.\nHere we present both cases as well as considerations and technical details of both options. We’ll demonstrate both of these cases with the subset function in R (subsetting uses a single square bracket for tabular data, [). First the fail-on-subsetting. Before we write the subsetting function it is useful to have a function that checks that an object of our class is valid, a so-called validator function.\n\nvalidate_birthdays <- function(x) {\n stopifnot(\n \"input must contain 'name' and 'birthday' columns\" =\n all(c(\"name\", \"birthday\") %in% colnames(x)),\n \"names must be a character\" =\n is.character(x$name),\n \"birthday must be a date\" =\n lubridate::is.Date(x$birthday)\n )\n invisible(x)\n}\n\nThis will return an error if the class is not valid (defined in terms of the class’ invariants).\nNow we can show how to error if one of the invariants are removed during subsetting. See ?NextMethod() for information on method dispatch.\n\n`[.birthdays` <- function(x) {\n validate_birthdays(NextMethod())\n}\n\nbirthdays[, -1]\n# Error in validate_birthdays(NextMethod()) :\n# input must contain 'name' and 'birthday' columns\n\nThe second design option is the reconstruct-on-subsetting. This checks whether the class is valid, and if not downgrade the class to the superclass, in our case a data frame. This is done by not only validating the object during subsetting but to check whether it is a valid class object, and then either ensuring all of the attributes of the subclass – in our case <birthdays> – are maintained, or attributes are stripped and only the attributes of the base superclass – in our case data.frame – are kept.\n\n\n\n\n\n\nNote\n\n\n\nImportant note: this section of the post relies heavily on https://github.com/DavisVaughan/2020-06-01_dplyr-vctrs-compat.\n\n\nThe four functions that are required to be added to ensure our class is correctly handled when invaliding it are:\n\nbirthdays_reconstruct()\nbirthdays_can_reconstruct()\ndf_reconstruct()\ndplyr_reconstruct.birthdays()\n\nWe’ll tackle the first three first, and then move onto to the last one as this requires some extra steps.\nbirthdays_reconstruct() is a function that contains an if-else statement to determine whether the returned object is a <birthdays> or data.frame object.\n\nbirthdays_reconstruct <- function(x, to) {\n if (birthdays_can_reconstruct(x)) {\n df_reconstruct(x, to)\n } else {\n x <- as.data.frame(x)\n message(\"Removing crucial column in `<birthdays>` returning `<data.frame>`\")\n x\n }\n}\n\nThe if-else evaluation is controlled by birthdays_can_reconstruct(). This function determines whether after subsetting the object is a valid <birthdays> class. It checks whether the validator fails, in which case it returns FALSE, otherwise the function will return TRUE.\n\nbirthdays_can_reconstruct <- function(x) {\n # check whether input is valid\n valid <- tryCatch(\n { validate_birthdays(x) },\n error = function(cnd) FALSE\n )\n\n # return boolean\n !isFALSE(valid)\n}\n\nThe next function required is df_reconstruct(). This is called when the object is judged to be a valid <birthdays> object and simply copies the attributes over from the <birthdays> class to the object being subset.\n\ndf_reconstruct <- function(x, to) {\n attrs <- attributes(to)\n attrs$names <- names(x)\n attrs$row.names <- .row_names_info(x, type = 0L)\n attributes(x) <- attrs\n x\n}\n\nThe three functions defined for reconstruction can be added to a package with the subsetting function in order to subset <birthdays> objects and returning either <birthdays> objects if still valid, or data frames when invalidated. This design has the benefit that when conducting data exploration a user is not faced with an error, but can continue with a data frame, while being informed by the message printed to console in birthdays_reconstruct().\n\n`[.birthdays` <- function(x, ...) {\n out <- NextMethod()\n birthdays_reconstruct(out, x)\n}\n\n\n\n\n\nlibrary(dplyr)\n\nIn order to be able to operate on our <birthdays> class using functions from the package {dplyr}, as would be common for data frames, we need to make our function compatible. This is where the function dplyr_reconstruct.birthdays() comes in. dplyr_reconstruct() is a generic function exported by {dplyr}. It is called in {dplyr} verbs to make sure that the objects are restored to the input class when not invalidated.\n\ndplyr_reconstruct.birthdays <- function(data, template) {\n birthdays_reconstruct(data, template)\n}\n\nInformation about the generic can be found through the {dplyr} help documentation.\n\n?dplyr::dplyr_extending\n?dplyr::dplyr_reconstruct\n\nAs explained in the help documentation, {dplyr} also uses two base R functions to perform data manipulation. names<- (i.e the names setter function) and [ the one-dimensional subsetting function. We therefore define these methods for our custom class in order for dplyr_reconstruct() to work as intended.\n\n`[.birthdays` <- function(x, ...) {\n out <- NextMethod()\n birthdays_reconstruct(out, x)\n}\n\n`names<-.birthdays` <- function(x, value) {\n out <- NextMethod()\n birthdays_reconstruct(out, x)\n}\n\nThis wraps up the need for adding function to perform data manipulation using the reconstruction design outlined above.\nHowever, there is some final housekeeping to do. In cases when {dplyr} is not a package dependency (either imported or suggested), then the S3 generic dplyr_reconstruct() is required to be loaded. In R versions before 3.6.0 – this also works for R versions later than 3.6.0 – the generic function needs to be registered. This is done by writing an .onLoad() function, typically in a file called zzz.R. This is included in the {ExtendDataFrames} package for illustrative purposes.\n\n\n\nzzz.R\n\n.onLoad <- function(libname, pkgname) {\n s3_register(\"dplyr::dplyr_reconstruct\", \"birthdays\")\n invisible()\n}\n\nThe s3_register() function used in .onLoad() also needs to be added to the package and this function is kindly supplied by both {vctrs} and {rlang} unlicensed and thus can be copied into another package. See the R packages book for information about .onLoad() and attaching and loading in general.\nSince R version 3.6.0 this S3 generic registration happens automatically with S3Method() in the package namespace using the {roxygen2} documentation #' @exportS3Method dplyr::dplyr_reconstruct.\nThere is one last option which prevents the hard dependency on a relatively recent R version. Since {roxygen2} version 6.1.0, there is the @rawNamespace tag which allows insertion of text into the NAMESPACE file. Using this tag the following code will check the local R version and register the S3 method if equal to or above 3.6.0.\n\n#' @rawNamespace if (getRversion() >= \"3.6.0\") {\n#' S3method(pkg::fun, class)\n#' }\n\nEach of the three options for registering S3 methods has different benefits and downsides, so the choice depends on the specific use-case. Over time it may be best to use the most up-to-date methods as packages are usually only maintained for a handful of recent R releases1.\nThe topics discussed in this post have been implemented in the {epiparameter} R package within Epiverse-TRACE.\nCompatibility with {vctrs} is also possible using the same mechanism (functions) described in this post, and if interested see https://github.com/DavisVaughan/2020-06-01_dplyr-vctrs-compat for details.\nFor other use-cases and discussions of the designs and implementations discussed in this post see:\n\n{dials} R package\n{rsample} R package\n{googledrive} R package\nPull request on {tibble} R package\n\nThis blog post is a compendium of information from sources that are linked and cited throughout. Please refer to those sites for more information and as the primary source for citation in further work." }, { - "objectID": "posts/benchmarking_design_decisions/index.html#conclusion", - "href": "posts/benchmarking_design_decisions/index.html#conclusion", - "title": "Using benchmarking to guide the adoption of dependencies in R packages", - "section": "Conclusion", - "text": "Conclusion\nIn designing R package infrastructure with dependencies, it might sometimes be necessary to check if they don’t slow down existing processes. Here, we have demonstrated how benchmarking is one way to achieve this for a process involving condition signalling. We show how a simple decision to use cli::cli_warn() to handle warnings could come at the cost of a tiny loss in speed, which is worth considering in its context.\nThe demonstration here can be extended to other dependency adoption decisions for input checking, loops, object manipulations, and so forth. We recommend benchmarking as a way to help developers make quick design decisions. However, we also recommend that developers consider the context of the optimisation in interpreting the results.\nI would like to thank Pratik Gupte, Joshua Lambert, and Hugo Gruson for their invaluable reviews and suggestions that helped improve this post." + "objectID": "posts/extend-dataframes/index.html#footnotes", + "href": "posts/extend-dataframes/index.html#footnotes", + "title": "Extending Data Frames", + "section": "Footnotes", + "text": "Footnotes\n\n\nThis is the working practise of tidyverse packages: https://www.tidyverse.org/blog/2019/04/r-version-support/↩︎" }, { - "objectID": "posts/benchmarking_design_decisions/index.html#other-r-packages-for-benchmarking", - "href": "posts/benchmarking_design_decisions/index.html#other-r-packages-for-benchmarking", - "title": "Using benchmarking to guide the adoption of dependencies in R packages", - "section": "Other R packages for benchmarking", - "text": "Other R packages for benchmarking\n\nmicrobenchmark: an R package for comparing the execution time of R expressions.\nrbenchmark: an R package for benchmarking R code.\ntictok: an R package to time R functions\ntouchstone: an R package for benchmarking of pull requests with statistical confidence." + "objectID": "posts/lint-rcpp/index.html", + "href": "posts/lint-rcpp/index.html", + "title": "Improving the C++ Code Quality of an Rcpp Package", + "section": "", + "text": "The R package development ecosystem includes packages such as {lintr} and {styler} that can help to check code style, and to format R code.\nHowever, these packages cannot lint or style the C++ code of {Rcpp} packages. This could leave the C++ code of an Rcpp package less clean than the R code, increasing the technical debt already associated with using two languages.\nIn Epiverse-TRACE, we encounter this issue with {finalsize}, and we anticipate the same issue with further epidemic modelling packages that we seek to develop or adapt, such as {fluEvidenceSynthesis}.\nOur use-case is not unique, of course, and other projects could have their own solutions. One such, from which we have borrowed some ideas, is the Apache Arrow project, whose R package also uses a C++ backend (via {cpp11} rather than {Rcpp})." }, { - "objectID": "posts/benchmarking_design_decisions/index.html#footnotes", - "href": "posts/benchmarking_design_decisions/index.html#footnotes", - "title": "Using benchmarking to guide the adoption of dependencies in R packages", - "section": "Footnotes", - "text": "Footnotes\n\n\nBenchmark (Wikipedia): https://en.wikipedia.org/wiki/Benchmark_(computing)↩︎\nFormating errors can be slow/variable: https://github.com/r-lib/cli/issues/617↩︎\ncli benchmarks: https://cli.r-lib.org/articles/ansi-benchmark.html↩︎\nDonald Knuth’s quoted as having said, “The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.” (See https://en.wikiquote.org/wiki/Donald_Knuth)↩︎\nProfiling (Wikipedia): https://csgillespie.github.io/efficientR/performance.html↩︎" + "objectID": "posts/lint-rcpp/index.html#use-case", + "href": "posts/lint-rcpp/index.html#use-case", + "title": "Improving the C++ Code Quality of an Rcpp Package", + "section": "", + "text": "The R package development ecosystem includes packages such as {lintr} and {styler} that can help to check code style, and to format R code.\nHowever, these packages cannot lint or style the C++ code of {Rcpp} packages. This could leave the C++ code of an Rcpp package less clean than the R code, increasing the technical debt already associated with using two languages.\nIn Epiverse-TRACE, we encounter this issue with {finalsize}, and we anticipate the same issue with further epidemic modelling packages that we seek to develop or adapt, such as {fluEvidenceSynthesis}.\nOur use-case is not unique, of course, and other projects could have their own solutions. One such, from which we have borrowed some ideas, is the Apache Arrow project, whose R package also uses a C++ backend (via {cpp11} rather than {Rcpp})." }, { - "objectID": "posts/simulist_v0.2.0/index.html", - "href": "posts/simulist_v0.2.0/index.html", - "title": "simulist v0.2.0", - "section": "", - "text": "We are very excited to announce the release of a new simulist version v0.2.0. Here is an automatically generated summary of the changes in this version.\nSecond release of simulist, updates the core simulation model and, as a result, the arguments for sim_*() functions for simulating line list data and/or contact table data exported from simulist are updated. The internal package architecture is also refactored." + "objectID": "posts/lint-rcpp/index.html#choice-of-c-linters", + "href": "posts/lint-rcpp/index.html#choice-of-c-linters", + "title": "Improving the C++ Code Quality of an Rcpp Package", + "section": "Choice of C++ linters", + "text": "Choice of C++ linters\nC++ linters such as clang-tidy stumble when dealing with C++ code in src/, as the clang toolchain attempts to compile it. This does not work for Rcpp packages, as the Rcpp.h header cannot be found — this linking is handled by {Rcpp}.\nFortunately, other C++ linters and code checking tools are available and can be used safely with Rcpp packages.\nWe have chosen to use cpplint and cppcheck for {finalsize}.\n\nCpplint\ncpplint is a tool that checks whether C/C++ files follow Google’s C++ style guide. cpplint is easy to install across platforms, and does not error when it cannot find Rcpp.h.\nImportantly, cpplint can be instructed to not lint the autogenerated RcppExports.cpp file, which follows a different style.\nTo lint all other .cpp files, we simply run cpplint from the terminal.\ncpplint --exclude=\"src/RcppExports.cpp\" src/*.cpp\n\n\nCppcheck\ncppcheck is a static code analysis tool, that aims to “have very few false positives”. This is especially useful for the non-standard organisation of Rcpp projects compared to C++ projects.\ncppcheck can also be run locally and instructed to ignore the autogenerated RcppExports.cpp file, while throwing up issues with style.\ncppcheck -i src/RcppExports.cpp --enable=style --error-exitcode=1 src\nHere, the --enable=style option lets cppcheck flag issues with style, acting as a second linter. This enables the performance and portability flags as well. (We have not found any difference when using --enable=warning instead.)\nEnabling all checks (--enable=all) would flag two specific issues for {Rcpp} packages: (1) the Rcpp*.h headers not being found (of the class missingIncludeSystem), and (2) the solver functions not being used by any other C++ function (unusedFunction).\nThese extra options should be avoided in {Rcpp} packages, as the linking is handled for us, and the functions are indeed used later — just not by other C++ functions.\nThe --error-exitcode=1 argument returns the integer 1 when an error is found, which is by convention the output for an error." }, { - "objectID": "posts/simulist_v0.2.0/index.html#new-features", - "href": "posts/simulist_v0.2.0/index.html#new-features", - "title": "simulist v0.2.0", - "section": "New features", - "text": "New features\n\nExternal\n\ncreate_config() now returns a new element in the list: $network. By default create_config() returns network = \"adjusted\", which assumes the simulation is a random network and samples contacts with an excess degree distribution (see Details in ?create_config()). The network effect can be changed to \"unadjusted\" to switch off the network effect. $network is checked internally (in .sim_network_bp()) and will error if not valid (#60).\nPackage architecture diagram is added to design-principles.Rmd (#66).\nlintr GitHub action workflow (lint-changed-files.yaml) is added to the suite of continuous integration workflows (#68).\nTransmission chain visualisation is added to vis-linelist.Rmd (#70).\n\n\n\nInternal\n\n.sim_network_bp() is added as an internal function and replaces bpmodels::chain_sim() as the core simulation model producing contacted and infected individuals. {bpmodels} is removed as a dependency as a result (#60).\n.sample_names() is added as an internal function to utilise randomNames::randomNames() to produce more unique names than randomNames(..., sample.with.replacement = FALSE).\nRefactor of internal simulation architecture replaced .sim_bp_linelist(), .sim_clinical_linelist() and .sim_contacts_tbl() with .sim_internal() (#66).\nThe sim_utils.R file was renamed to utils.R (#66) and the empty create_linelist.R file was removed (#72).\n.add_date_contact() argument outbreak_start_date is now NULL by default instead of missing (#82).\nRegression tests of sim_*() functions now use snapshot testing for more detailed data checking (#65).\nInternal testing data (testdata) files have been updated, as has the testdata/README.md with instructions (#64)." + "objectID": "posts/lint-rcpp/index.html#adding-c-linting-to-ci-workflows", + "href": "posts/lint-rcpp/index.html#adding-c-linting-to-ci-workflows", + "title": "Improving the C++ Code Quality of an Rcpp Package", + "section": "Adding C++ linting to CI workflows", + "text": "Adding C++ linting to CI workflows\nBoth cpplint and cppcheck can be easily added to continuous integration workflows. In Epiverse-TRACE, we use Github Actions. The C++ lint workflow we have implemented looks like this:\non:\n push:\n paths: \"src/**\"\n pull_request:\n branches:\n - \"*\"\n\nname: Cpp-lint-check\n\njobs:\n cpplint:\n runs-on: ubuntu-latest\n steps:\n - uses: actions/checkout@v3\n - uses: actions/setup-python@v2\n - run: pip install cpplint\n - run: cpplint --quiet --exclude=\"src/RcppExports.cpp\" src/*.cpp\n\n cppcheck:\n runs-on: ubuntu-latest\n steps:\n - uses: actions/checkout@v3\n - run: sudo apt-get install cppcheck\n - run: cppcheck -i src/RcppExports.cpp --quiet --enable=warning --error-exitcode=1 .\nThe workflow is triggered when there are changes to files in src/, and on all pull requests." }, { - "objectID": "posts/simulist_v0.2.0/index.html#breaking-changes", - "href": "posts/simulist_v0.2.0/index.html#breaking-changes", - "title": "simulist v0.2.0", - "section": "Breaking changes", - "text": "Breaking changes\n\nR and serial_interval arguments have been removed from sim_linelist(), sim_contacts() and sim_outbreak() functions and instead contact_distribution, contact_interval and prob_infect are used to parameterise the simulation. Documentation, both functions and vignettes, have been updated with these changes (#60).\nThe contact_distribution argument in sim_*() functions requires a density function if supplied as an anonymous function. Information is added to simulist.Rmd to explain this.\nThe line list output from sim_linelist() now uses column header sex instead of gender. The contacts table output from sim_contacts() and sim_outbreak() now uses column headers age and sex instead of cnt_age and cnt_gender (#60, #79).\ncontact_distribution is redefined and redocumented as the distribution of contacts per individual, rather than the number of contacts that do not get infected as it was in v0.1.0.\nrow.names for <data.frame>s output by sim_linelist(), sim_contacts() and sim_outbreak() are now sequential from 1:nrows (#63)." + "objectID": "posts/lint-rcpp/index.html#formatting-c-code", + "href": "posts/lint-rcpp/index.html#formatting-c-code", + "title": "Improving the C++ Code Quality of an Rcpp Package", + "section": "Formatting C++ code", + "text": "Formatting C++ code\nC++ code can be automatically formatted to avoid linter errors. An especially useful tool is clang-format. Our code is styled to follow the Google C++ style guide using:\n# replace .cpp with .h to format headers\nclang-format -i -style=google src/*.cpp\nHowever, this also formats the autogenerated RcppExports.cpp file. It can be extra work to repeatedly undo this change and keep the original formatting, but clang-format does not provide an easy inline way to ignore this file.\nInstead, clang-format can be passed all files except RcppExports.cpp to style using some simple shell commands. In smaller projects, it might be worth\nfind src -name \"*.cpp\" ! -name \"RcppExports.cpp\" -exec clang-format -style=google -i {} \\;" }, { - "objectID": "posts/simulist_v0.2.0/index.html#bug-fixes", - "href": "posts/simulist_v0.2.0/index.html#bug-fixes", - "title": "simulist v0.2.0", - "section": "Bug fixes", - "text": "Bug fixes\n\nsim_contacts() now correctly runs with an age-structured population. In the previous version (v0.1.0), sim_contacts() did not call .check_age_df() and as a result the function errored, this is fixed as of PR #81." + "objectID": "posts/lint-rcpp/index.html#turning-off-linting-and-formatting", + "href": "posts/lint-rcpp/index.html#turning-off-linting-and-formatting", + "title": "Improving the C++ Code Quality of an Rcpp Package", + "section": "Turning off linting and formatting", + "text": "Turning off linting and formatting\nThere are cases in which we might want to turn linting and formatting off. This might be when the linter does not agree with valid C++ code required in the project, or when the linters and stylers do not agree with each other. These tools are developed separately by large software projects with their own internal requirements, and solutions to issues encountered in their work: clang-format by LLVM (although specifying -style=google), and cpplint from Google’s work.\n\nLinter-enforced paradigms\nSometimes, the linter or styler developer enforces both a style and the use of certain programming paradigms. An example from cpplint is when it warns against passing function arguments by reference, and prefers for these to be passed as pointers, or as constant references (const int &value).\nint some_function(int &value) { \n /* operations modifying value */\n return value;\n}\nPassing the argument as a const reference would not serve the needs of this function, and passing by value is a valid strategy when we don’t want to get into the details of using pointers. (Note that this is typically an issue when large objects such as custom classes or structs are passed to a function multiple times.)\nSimilarly, cpplint will throw a warning about accessing variables using std::move, which is something we encounter in the Newton solver in {finalsize}. While not technically wrong for such a simple use case, the linter is correct to cautiously throw a warning nonetheless.\n\n\nLinter-styler disagreement\nOne example of linter-styler disagreement is the use of BOOST_FOREACH from the Boost libraries as an alternative to for loops. clang-format will insist on adding two spaces before the opening bracket: BOOST_FOREACH (). cpplint will insist on removing one space.\ncpplint and clang-format also disagree on the order of header inclusions, especially when both local and system headers are included.\n\n\nDisabling checks on code chunks\nEither of these cases could require disabling linting or formatting on some part of the code. It is possible to turn off linting using cpplint at particular lines using the comment // NOLINT. Multiple lines can be protected from linting as well.\n// NOLINTBEGIN\n<some C++ code here>\n// NOLINTEND\nAlternatively, clang-format can be instructed to ignore chunks of code using comment messages too.\n// clang-format off\n<some C++ code here>\n// clang-format on" }, { - "objectID": "posts/simulist_v0.2.0/index.html#deprecated-and-defunct", - "href": "posts/simulist_v0.2.0/index.html#deprecated-and-defunct", - "title": "simulist v0.2.0", - "section": "Deprecated and defunct", - "text": "Deprecated and defunct\n\nNone" + "objectID": "posts/lint-rcpp/index.html#linter-options-for-future-packages", + "href": "posts/lint-rcpp/index.html#linter-options-for-future-packages", + "title": "Improving the C++ Code Quality of an Rcpp Package", + "section": "Linter options for future packages", + "text": "Linter options for future packages\n{finalsize} is a relatively simple {Rcpp} package, with no C/C++ headers, and no C++ tests. However, future Epiverse-TRACE packages could be more similar to {fluEvidenceSynthesis}, and will have header files, and could also have C++ unit tests via the catch framework.\ncpplint will demand that all local headers be prefixed with their directory (src/), but this would cause the code to break as {Rcpp} looks for a subdirectory called src/src/. This can be turned off by passing the filter option --filter=\"-build/include_subdir\" to cpplint. Alternatively, we could place headers in a subdirectory such as inst/include.\nBoth cpplint and cppcheck can be instructed to ignore C++ test files using the catch testing framework provided by {testthat}. This prevents errors due to the specialised syntax provided by {testthat} in testthat.h, such as context.\n# for cpplint, add an extra exclude statement\ncpplint <...> --exclude=\"src/test*.cpp\" src/*.cpp\n\n# for cppcheck, suppress checks on test files\ncppcheck <...> --suppress=*:src/test_*.cpp src" }, { - "objectID": "posts/simulist_v0.2.0/index.html#acknowledgements", - "href": "posts/simulist_v0.2.0/index.html#acknowledgements", - "title": "simulist v0.2.0", - "section": "Acknowledgements", - "text": "Acknowledgements\nMany thanks to the contributors to this release, either from issues, code contributions, reviews or discussions (listed in alphabetical order):\n@adamkucharski, @avallecam, @Bisaloo, @chartgerink, @jamesmbaazam, @pratikunterwegs, and @sbfnk." + "objectID": "posts/lint-rcpp/index.html#conclusion", + "href": "posts/lint-rcpp/index.html#conclusion", + "title": "Improving the C++ Code Quality of an Rcpp Package", + "section": "Conclusion", + "text": "Conclusion\nIt is actually somewhat surprising that there does not seem to be a canonical linter for C++ code in {Rcpp} packages. The methods laid out here are an initial implementation developed for use with the {finalsize} package, and the considerations here are a starting point. We shall be continuously evaluating how we ensure the quality of our C++ code as we encounter more use cases while developing future Epiverse-TRACE packages." }, { - "objectID": "posts/linelist_v1.0.0/index.html", - "href": "posts/linelist_v1.0.0/index.html", - "title": "linelist v1.0.0", + "objectID": "posts/superspreading_v0.2.0/index.html", + "href": "posts/superspreading_v0.2.0/index.html", + "title": "superspreading v0.2.0", "section": "", - "text": "We are very excited to announce the release of a new linelist version v1.0.0. Here is a automatically generated summary of the changes in this version." + "text": "We are very excited to announce the release of a new superspreading version v0.2.0. Here is an automatically generated summary of the changes in this version.\nSecond minor release of superspreading. This release enhances functions added in v0.1.0 and adds two new exported functions, and two new vignettes.\nlibrary(superspreading)" }, { - "objectID": "posts/linelist_v1.0.0/index.html#new-features", - "href": "posts/linelist_v1.0.0/index.html#new-features", - "title": "linelist v1.0.0", + "objectID": "posts/superspreading_v0.2.0/index.html#new-features", + "href": "posts/superspreading_v0.2.0/index.html#new-features", + "title": "superspreading v0.2.0", "section": "New features", - "text": "New features\n\nIncreased compatibility with dplyr is now documented and ensured through tests of all dplyr verbs on linelist objects as part of our testing & continuous integration system, as well as a new vignette: https://epiverse-trace.github.io/linelist/articles/compat-dplyr.html (@Bisaloo, #53)\nA new selection helper is provided for tidyverse users, based on the existing selectors provided by the tidyselect package: has_tag() (@Bisaloo, #61). By feeding it a character vector of tags to operate on, you can work with dplyr verbs on specific tagged columns without having to explicitly use the column names:\nx %>%\n dplyr::select(has_tag(c(\"id\", \"date_of_onset\")))" + "text": "New features\n\nA new function (calc_network_R()) to estimate the reproduction number for heterogeneous networks and a vignette outlining use cases for the function from existing epidemiological literature is added (#71).\nprobability_epidemic() and probability_extinct() now have individual-level and population-level control in a joint framework (#70).\nproportion_*() functions can now return proportion columns of the output <data.frame> as numeric when the new argument format_prop is set to FALSE (#72).\nA new design principles vignette to document package development is added (#68).\nAdded a helper function (ic_tbl()) to improve model comparison tables (#65).\nprobability_*() functions now accept dynamic dots ({rlang} is added as a dependency) (#82)." }, { - "objectID": "posts/linelist_v1.0.0/index.html#breaking-changes", - "href": "posts/linelist_v1.0.0/index.html#breaking-changes", - "title": "linelist v1.0.0", + "objectID": "posts/superspreading_v0.2.0/index.html#breaking-changes", + "href": "posts/superspreading_v0.2.0/index.html#breaking-changes", + "title": "superspreading v0.2.0", "section": "Breaking changes", - "text": "Breaking changes\n\nIt is no longer possible to use lost_tags_action() within a pipeline. It must now be set as a separate step. This makes the internal code more robust and clarifies what is part of the pipeline versus a global option (@Bisaloo, #79).\nThe select_tags() function is now deprecated to ensure we provide just one clear way to address a given issue and that our “happy path” is clearly signposted (@Bisaloo, #61). If you were using this function, we now recommend using the more explicit two-steps process:\n# Deprecated\nx %>%\n select_tags(\"age\")\n\n# Instead use\nx %>%\n tags_df() %>%\n select(age)\n\n# Or\nx %>%\n select(has_tag(\"age\")) %>%\n tags_df()\nThe custom select.linelist() method has been deprecated as providing a custom [.linelist() is sufficient to ensure compatibility with dplyr::select() default methods, including triggering lost_tags_action() on tag removal (@Bisaloo, #61). A full deletion of this method is not possible at the moment because we want to provide a smooth transition for users that relied on the custom tags argument of the select.linelist() method. It is now recommend instead to use the new has_tag() selection helper:\nx %>%\n dplyr::select(has_tag(c(\"id\", \"date_of_onset\")))\n\n# Instead of\nx %>%\n select(tags = c(\"id\", \"date_of_onset\"))\nThe custom rename.linelist() method has been removed as providing a custom names<-().linelist method is sufficient to ensure compatibility with dplyr::rename(), including appropriate modification of the tags. (@Bisaloo, #60)" + "text": "Breaking changes\n\nind_control and pop_control arguments replace control and control_type arguments in probability_contain(); and the argument default for num_init_infect is removed (#70).\nChanged epidist argument to offspring_dist to follow Epiverse style (affects several functions) (#64).\nArgument in proportion_transmission() has been renamed from sim to simulate (#81)." }, { - "objectID": "posts/linelist_v1.0.0/index.html#documentation", - "href": "posts/linelist_v1.0.0/index.html#documentation", - "title": "linelist v1.0.0", - "section": "Documentation", - "text": "Documentation\n\nadded a hex logo thanks to David Mascarina’s contribution (@dgmascarina)\nadded short lay description to README thanks to Emma Marty’s contribution" + "objectID": "posts/superspreading_v0.2.0/index.html#minor-changes", + "href": "posts/superspreading_v0.2.0/index.html#minor-changes", + "title": "superspreading v0.2.0", + "section": "Minor changes", + "text": "Minor changes\n\nNew package hex logo (#73).\nUpdated continuous integration and package infrastructure (#67).\nImproved function documentation (#63).\nOptimisation now uses optim() by default (#82).\nTesting suite now uses snapshot testing for regression tests (#84)." }, { - "objectID": "posts/linelist_v1.0.0/index.html#bug-fixes", - "href": "posts/linelist_v1.0.0/index.html#bug-fixes", - "title": "linelist v1.0.0", + "objectID": "posts/superspreading_v0.2.0/index.html#bug-fixes", + "href": "posts/superspreading_v0.2.0/index.html#bug-fixes", + "title": "superspreading v0.2.0", "section": "Bug fixes", - "text": "Bug fixes\n\nlinelist is now explicitly marked as incompatible with data.table. In practice, make_linelist(x) now errors if x inherits from data.table (#55, @Bisaloo, based on discussions with @TimTaylor).\n[.linelist() now works to subset by column when including just one argument (#54, @Bisaloo). E.g., x[1]. As an indirect effect, this also improves compatibility with dplyr verbs that rely on this method (#51).\nsubsetting a linelist with extra tags (e.g., created via make_linelist(allow_extra = TRUE)) no longer causes an error (#65, @Bisaloo; reported by @TimTaylor in #63)" + "text": "Bug fixes\n\nNone to {superspreading} functions.\nUpdate {epiparameter} use in vignette and tests (#62)." }, { - "objectID": "posts/linelist_v1.0.0/index.html#internal-changes", - "href": "posts/linelist_v1.0.0/index.html#internal-changes", - "title": "linelist v1.0.0", - "section": "Internal changes", - "text": "Internal changes\n\ntestthat tests now run in parallel (#76, @Bisaloo)\ntestthat tests now warn on partial matching (#76, @Bisaloo)" + "objectID": "posts/superspreading_v0.2.0/index.html#deprecated-and-defunct", + "href": "posts/superspreading_v0.2.0/index.html#deprecated-and-defunct", + "title": "superspreading v0.2.0", + "section": "Deprecated and defunct", + "text": "Deprecated and defunct\n\nNone" }, { - "objectID": "posts/comm-software-devel/index.html", - "href": "posts/comm-software-devel/index.html", - "title": "Communicating development stages of open-source software", + "objectID": "posts/data-licensing.html", + "href": "posts/data-licensing.html", + "title": "Dual licensing R packages with code and data", "section": "", - "text": "Software is not immediately stable when being developed. It undergoes design changes, changes to the user interface (application programming interface, API), and features get added or removed over time. Software in a open-source setting, in which the code is publicly hosted in a repository (e.g., Github, GitLab, Bitbucket), allows anyone to track developments. It also allows the developer community to easily contribute to the software.\nThere are certain metrics which can be used to convey the stage of development to users or other developers. For example the number of commits, a repository with few commits may indicate that a project is still in an incipient phase and will undergo several breaking changes. However, different software projects become stable at different rates and the number of commits may mean very different things for a repository containing an operating system compared to an R package with minimal functionality. It is therefore important that developers communicate with users and other developers at what stage the project is in, and how stable the code base is.\nSoftware development, and specifically R package development, has several methods to communicate stability. This blog post will discuss two such methods and give examples for each. The first of these is versioning code, which establishes points in development where the code is ready for use; and the second is lifecycle badges, these can be placed at a different levels within software (e.g., package, function, function argument) to convey how a user should interact and use." - }, - { - "objectID": "posts/comm-software-devel/index.html#versioning", - "href": "posts/comm-software-devel/index.html#versioning", - "title": "Communicating development stages of open-source software", - "section": "Versioning", - "text": "Versioning\nVersioning code is not a new concept and has been used for decades1. It has led to version control systems such as git. However, in this post we are interested in versioning to communicate development.\n\nSemantic versioning\nOne such philosophy is semantic versioning (SemVer). This aims to describe the stage of software development by attaching semantics (i.e. meaning) to the format and numbering of versions. The version system works through three numbers, each separated by a dot. The numbers, from left to right, convey major version, minor version and patch version. As an example, 0.5.2, is newer than 0.3.9.\nEmploying semantic versioning in ones code development allows others to determine whether a package has undergone substantial development and testing, and informs to whether it would make a suitable package to use in a script or as a dependency for another package. Semantic versioning also describes the changes made to a package. As explained on their website, incrementing the major version implies a breaking change, a minor increment is a backwards compatible change and lastly patches are mostly applied to bug fixes. This aids users in understanding whether they should continue using a package, whether their package needs updating due to a breaking change or whether they need to install the newest version because a bug was recently fixed.\nExamples of changes that correspond to major, minor or patch updates can be seen in the version release notes (NEWS.md file) of {dplyr} and {ggplot2}.\nIn R there are several packages that work with versioning, and specifically semantic versioning. The {semver} package provides functions for parsing, rendering and comparing versions. There is also the {semverutils} R package which provides similar functionality using R6. The {usethis} package provides handy utility functions for changing the versions of R packages (usethis::use_version() and usethis::use_dev_version()). R also comes with a package_version() function for creating and validating versions.\nOverall semantic versioning provides what they describe as a “formal specification” to facilitate management of package development and the dependencies of that package. It is the most widely-used versioning system and therefore will be understood by a wide variety of users and developers.\nSome of the critique raised for semantic versioning is the difficulty of defining how changes correspond to a version increment. Semantic versioning states only breaking changes warrant major releases, but a vast re-write of a code base may also justify a major version change. Different breaking changes have different magnitudes, therefore a change to a single exported function or a change to every exported function will be communicated in a single, equal, version increment.\n\n\nAlternatives to semantic versioning\nThere are several other versioning frameworks aside from semantic versioning. One common option is calendar versioning (CalVer). The format of CalVer is usually year-month (YY-MM), or year-month-day (YY-MM-DD), depending on the regularity of releases, and allows appending tags (micros or modifiers, e.g. YY-MM.1).\nOther versioning schemes can appear similar to semantic versioning, but do not follow the guidelines around version semantics. In these cases, a bump in the major version may not relate to a breaking change. Additionally, other numbers can be attached to the traditional x.y.z format, such as build numbers. Build number versioning adds an extra number to specify the build (x.y.z.build_number). There are many other variants but covering all versioning systems is outside the scope of this post.\n\n\nVersioning an R package\nThere are some restrictions on valid version numbers for R packages. The official “Writing R Extensions” guide state:\n\nThis is a sequence of at least two (and usually three) non-negative integers separated by single ‘.’ or ‘-’ characters.\n\n\n\nWhy version?\nThe benefits of versioning apply beyond communicating with users and developers. Implementing versioning eases reproducibility by allowing systems to record which version of a language or package was used. In R this can be achieved in several ways, with some popular examples being the {renv} package and docker." + "text": "Licenses are an important topic within open source. Without licenses, information or code can be publicly available but not legally available for reuse or redistribution. The open source software community’s most common licenses are the MIT license or the GNU GPLv3.\nWhen you read the MIT or GNU license, you can see they are rather specific:\nand\nThey aim to cover primarily software, not other forms of information such as, for example, data." }, { - "objectID": "posts/comm-software-devel/index.html#lifecycle-badges", - "href": "posts/comm-software-devel/index.html#lifecycle-badges", - "title": "Communicating development stages of open-source software", - "section": "Lifecycle badges", - "text": "Lifecycle badges\nBadges can be pasted onto visible parts of the code, for example a readme document in the root of the repository, to show the development phase and stability. The three badging systems we will discuss in this post are:\n\nRepoStatus\nTidyverse lifecycles\nReconverse lifecyles\n\n\nRepoStatus\nRepoStatus is a language agnostic set of badges which describe the stages of code development and the possible transitions between those stages.\nAs shown in the figure below, there are multiple stages to communicate both unstable and stable software. There are also multiple paths between each stage, recognising the varied routes software development can take.\n\n\n\nRepoStatus badge system. Reused under CC BY-SA 4.0 from repostatus.org\n\n\n\n\nTidyverse\nThe tidyverse approach is broadly similar to RepoStatus. The {lifecycle} R package contains the description of their process. There are four stages:\n\nExperimental\nStable\nSuperseded (previously called retired)\nDeprecated\n\nMost code will go through the experimental phase, as it will likely change its API and the number and order of arguments might change. Once code is not going to drastically change (i.e. no breaking changes), at least from a users point of view, it can be labelled stable. In the tidyverse lifecycle schematic, all experimental code transitions to stable code.\nThe two stages that follow stable are: superseded and deprecated. The former describes a situation in which a new package, a new function or a new argument, depending on the context, has been developed which the developer feels should be used instead of the now superseded code. Superseded code is still developed in the sense that changes to the language or package that may break the function will be fixed as well as bug fixes, but the function will not received ongoing development. The latter, deprecation, is used in cases when the developer thinks that a package or function should not longer be used. This is primarily employed when code is depended on by other software and therefore deleting the code would cause breaks in reverse dependencies. Thus the deprecation warning allows developers of those dependencies time to make the relevant changes.\n\n\n\n{lifecycle} badge system. Reused under MIT license from lifecycles R package\n\n\nOne of the main differences between the tidyverse lifecycles, compared to the others discussed in this posts is their applicability at different levels in the code. The lifecycle badges can be applied at the package-level (e.g., stringr), the function-level (e.g. dplyr::group_trim()) or the argument level (e.g., dplyr::across()).\nUsing {lifecycle} in a package can be setup using usethis::use_lifecycle(). The {lifecycle} package not only provides badges, but also informative deprecation notices which communicate to users that a function is not longer supported since a version release of a package. This offers the user a chance to find an alternative function for future use.\nThe use of deprecation warnings from {lifecycle} leads onto another aspect of tidyverse development: protracted deprecation. There is no fixed rules on how long after a deprecation warning is made to when code should be removed. In the tidyverse, this process is given ample time in order to allow the many developers that utilise tidyverse software to make the necessary changes. Full descriptions of the {lifecycle} package can be found on the website, including the deprecated use of questioning and maturing stages.\n\n\nReconverse\nReconverse provides four stages of software development:\n\nconcept\nexperimental\nmaturing\nstable\n\nA difference between {lifecycle} and reconverse is the explicit connection between semantic versioning and development stage in reconverse. The transitions between experimental, maturing and stable are linked to the versioning less than 0.1.0, less than 1.0.0 and greater than 1.0.0, respectively.\n\n\nDynamic badges\nAll badge frameworks discussed only offer static badges that require developers to manually update as the project moves between phases. This is subject to the maintainers remembering, which can lead to miscommunication about a package’s stage, which may have move on from being experimental, or not been worked on in years but has an active badge.\nDynamics badges, like those offered by https://shields.io/ give a good indication of how recently the project was changed by showing time since last commit, or the number of commits since last release. These too are not perfect but may better track changes and take the burden of badge updates off the project maintainer." + "objectID": "posts/data-licensing.html#footnotes", + "href": "posts/data-licensing.html#footnotes", + "title": "Dual licensing R packages with code and data", + "section": "Footnotes", + "text": "Footnotes\n\n\nFor a full list of license accepted by CRAN see: https://svn.r-project.org/R/trunk/share/licenses/license.db and they also accept stating the license as “Unlimited” for unrestricted distribution.↩︎\nA discussion of data and R packages can be found here: https://github.com/ropensci/unconf17/issues/61. This thread is used as an example of some thoughts on packaging data in R but we acknowledge it is from 2017 so the opinions of the individuals in this thread may have changed.↩︎\nTo see the legal definition of the database right in Europe, and what constitutes it, see the European Union Directive 96/9/EC↩︎\nSee this blog post by Julia Silge on including external data sets into an R package and rectifying incompatibilities with license↩︎" }, { - "objectID": "posts/comm-software-devel/index.html#communicating-development-in-the-epiverse-trace", - "href": "posts/comm-software-devel/index.html#communicating-development-in-the-epiverse-trace", - "title": "Communicating development stages of open-source software", - "section": "Communicating development in the Epiverse-TRACE", - "text": "Communicating development in the Epiverse-TRACE\nWithin the Epiverse-TRACE initiative we use semantic versioning and badges to convey to the community interacting with our code at which stage of developement each project is in. We do not have fixed rules on which badges to use and a variety of badges can be found across the repositories in the organisation. For example reconverse badges are used for {linelist}, RepoStatus badge is used in {finalsize}, and tidyverse badges are used in {epiparameter}.\nWe take this approach as no lifecycle badging system is perfect, each with benefits and downsides. The badges from {lifecycle} are the most common and thus recognisable in R package development, however may not port well to other languages or be familiar to developers coming to R from other frameworks. RepoStatus has the benefit of not being designed for a single language, and it’s number of badges gives greater acuity to the stage of development for a project. This may be especially useful if a package is newly developed and {lifecycle} would describe it as experimental, but RepoStatus provides granularity as to whether it is a concept package, work in progress (WIP) or started but abandoned.\nThere is some ambiguity in the semantics of the active stage in RepoStatus, which in the definition is “stable, usable state”, but may be misinterpreted as being unstable but actively developed.\nLastly reconverse provides a system akin to {lifecycle} and may be useful for those working in the epidemiology developer space. However, one downside of the reconverse system is there are no clear semantics for a package being deprecated or archived. As with almost all code, at some point development ceases and this stage should be communicated, even if just to say that the package is not being updated inline with developments in the underlying language, in this case R.\nThere are no plans within Epiverse-TRACE to develop a new badging system as the existing systems cover almost all use cases. In the event that the current development stage cannot be adequately communicated with a single badge from one of the frameworks discussed, a combination of badges can be used. For example, early on in a project adding both the experimental badge from {lifecycle} or reconverse and the WIP badge from RepoStatus may more accurately describe the projects develop pace. Alternatively, the stable badge, from either {lifecycle} or reconverse, can be coupled with either active or inactive from RepoStatus to let other developers know if software will be updated with new language features or dependency deprecations.\nOverall, the use of any of the three lifecycle frameworks described here is better than none." + "objectID": "posts/ai-learner-review/index.html", + "href": "posts/ai-learner-review/index.html", + "title": "Using LLM agents to review tutorials ‘in character’ as learners", + "section": "", + "text": "Part of the Epiverse-TRACE initiative involves development of training materials that span early, middle and late stage outbreak analysis and modelling tasks. To ensure that our tutorials are accessible to target audiences, we have developed a series of learner personas to inform the design of learning materials. These personas include the following:\n\nLucia, a Field Epidemiologist that use R for data cleaning, plotting and report for Outbreak response.\nJuan, a Statistician and R user in a National Health Agency with constant deployment to outbreak response.\nPatricia, a PhD student learning to use R and analyse Outbreak data for her collaborative project on GitHub.\nVania, a professor who needs ready-to-use training for her research and to pass on to students.\nDanielle, a Trainer that wants to remix content to create specific training materials for public health practitioners.\n\nAs the volume of training materials increases, we have explored automating the generation of initial reviews using large language models (LLMs) that take the form of ‘in character’ agents with instructions to provide constructive comments. This reflects a wider focus within the field of outbreak analytics on how LLMs agents can be used to increase the efficiency and scalability of common tasks (e.g. van Hoek et al, Lancet Microbe, 2024 ).\nTo generate the AI tutorial reviews, we use the OpenAI GPT-4 API, via the openai R package, as described in this repository. We also use the gh package to load the .Rmd materials from a given repository (e.g. epiverse-trace/tutorials-middle). Full illustrative code is available here, with the GPT-4 API prompts outlined below.\n# Define first prompt\nuser_prompt_1 <- \"You are the following person, and give all your answers in character:\"\n\n# Load Lucia persona\npersona_bio <- read_file(\"https://raw.githubusercontent.com/epiverse-trace/personas/master/lucia-outbreaks.qmd\")\n\n# Define second prompt\nuser_prompt_2 <- \"Now suppose you want to complete the following tutorial about outbreak analysis in R. The content is in R markdown but would be knitted to HTML in reality, with additional figures where relevant. Provide a critique of the tutorial from your perspective as a learner. What is unclear? What is useful? What is difficult? What could be refined? Support comments with brief quotes. In your feedback be succinct, positive, constructive and specific. State what content to keep and what to improve. Provide clear suggestions for next steps to remove, change or add content. Note that visualisations will be in the tutorial, but are not shown in the Rmd, so do not comment on these. Summarise your review and suggestions for specific improvements in short bullet point paragraphs. If some of the points are similar, amalgamate these into a single bullet point. \\n\\n\"\n\n# Call OpenAI API (credentials must already be defined - see full illustrative code link above)\noutput <- create_chat_completion(\n model = \"gpt-4\",\n messages = list(list(\"role\"=\"user\",\"content\" = paste0(user_prompt_1,persona_bio,\n user_prompt_2,tutorial_contents))\n )\n\n# Write review to .md using readr\nwrite_lines(output,\"output_review.md\"))" }, { - "objectID": "posts/comm-software-devel/index.html#footnotes", - "href": "posts/comm-software-devel/index.html#footnotes", - "title": "Communicating development stages of open-source software", - "section": "Footnotes", - "text": "Footnotes\n\n\nhttps://en.wikipedia.org/wiki/Version_control↩︎" + "objectID": "posts/ai-learner-review/index.html#turning-learner-personas-into-llm-agents", + "href": "posts/ai-learner-review/index.html#turning-learner-personas-into-llm-agents", + "title": "Using LLM agents to review tutorials ‘in character’ as learners", + "section": "", + "text": "Part of the Epiverse-TRACE initiative involves development of training materials that span early, middle and late stage outbreak analysis and modelling tasks. To ensure that our tutorials are accessible to target audiences, we have developed a series of learner personas to inform the design of learning materials. These personas include the following:\n\nLucia, a Field Epidemiologist that use R for data cleaning, plotting and report for Outbreak response.\nJuan, a Statistician and R user in a National Health Agency with constant deployment to outbreak response.\nPatricia, a PhD student learning to use R and analyse Outbreak data for her collaborative project on GitHub.\nVania, a professor who needs ready-to-use training for her research and to pass on to students.\nDanielle, a Trainer that wants to remix content to create specific training materials for public health practitioners.\n\nAs the volume of training materials increases, we have explored automating the generation of initial reviews using large language models (LLMs) that take the form of ‘in character’ agents with instructions to provide constructive comments. This reflects a wider focus within the field of outbreak analytics on how LLMs agents can be used to increase the efficiency and scalability of common tasks (e.g. van Hoek et al, Lancet Microbe, 2024 ).\nTo generate the AI tutorial reviews, we use the OpenAI GPT-4 API, via the openai R package, as described in this repository. We also use the gh package to load the .Rmd materials from a given repository (e.g. epiverse-trace/tutorials-middle). Full illustrative code is available here, with the GPT-4 API prompts outlined below.\n# Define first prompt\nuser_prompt_1 <- \"You are the following person, and give all your answers in character:\"\n\n# Load Lucia persona\npersona_bio <- read_file(\"https://raw.githubusercontent.com/epiverse-trace/personas/master/lucia-outbreaks.qmd\")\n\n# Define second prompt\nuser_prompt_2 <- \"Now suppose you want to complete the following tutorial about outbreak analysis in R. The content is in R markdown but would be knitted to HTML in reality, with additional figures where relevant. Provide a critique of the tutorial from your perspective as a learner. What is unclear? What is useful? What is difficult? What could be refined? Support comments with brief quotes. In your feedback be succinct, positive, constructive and specific. State what content to keep and what to improve. Provide clear suggestions for next steps to remove, change or add content. Note that visualisations will be in the tutorial, but are not shown in the Rmd, so do not comment on these. Summarise your review and suggestions for specific improvements in short bullet point paragraphs. If some of the points are similar, amalgamate these into a single bullet point. \\n\\n\"\n\n# Call OpenAI API (credentials must already be defined - see full illustrative code link above)\noutput <- create_chat_completion(\n model = \"gpt-4\",\n messages = list(list(\"role\"=\"user\",\"content\" = paste0(user_prompt_1,persona_bio,\n user_prompt_2,tutorial_contents))\n )\n\n# Write review to .md using readr\nwrite_lines(output,\"output_review.md\"))" }, { - "objectID": "posts/simulist_v0.1.0/index.html", - "href": "posts/simulist_v0.1.0/index.html", - "title": "simulist v0.1.0", - "section": "", - "text": "We are very excited to announce the release of a new simulist version v0.1.0. Here is an automatically generated summary of the changes in this version.\nInitial release of simulist, an R package containing tools to simulate epidemiological data such as line lists and contact tables." + "objectID": "posts/ai-learner-review/index.html#reviewing-transmissibilty-training-materials-as-a-senior-academic", + "href": "posts/ai-learner-review/index.html#reviewing-transmissibilty-training-materials-as-a-senior-academic", + "title": "Using LLM agents to review tutorials ‘in character’ as learners", + "section": "Reviewing transmissibilty training materials as a senior academic", + "text": "Reviewing transmissibilty training materials as a senior academic\nTo illustrate the sort of reviews that can be generated with the above approach, the below is the output produced when we imported the Vania senior academic persona alongside an early pre-Alpha version of the training materials for our transmissibility estimation tutorial:\n\nOverall, the tutorial is well-structured and provides a comprehensive guide on how to use the EpiNow2 package to estimate transmission metrics from case data. However, there are several areas that could be improved to enhance clarity and understanding for learners like me.\n\nPrerequisites: The prerequisites section could be expanded to include specific resources for learners to review the necessary concepts. For instance, links to resources on Bayesian analysis and the effective reproduction number would be helpful.\nJargon and Terminology: The tutorial contains several technical terms and jargon that could be confusing for learners. For example, terms like “Bayesian inference”, “posterior probability”, “likelihood”, and “prior probability” are used without clear explanations. Providing definitions or links to resources for these terms would be beneficial.\nCode Explanation: While the tutorial does a good job of explaining what the code does, it could provide more context on why certain functions or packages are used. For instance, it would be helpful to explain why the EpiNow2 package is used and how it differs from other packages.\nVisualizations: Although visualizations are not shown in the Rmd, it would be beneficial to explain what learners should expect to see in the visualizations and how to interpret them.\nExercises: The tutorial lacks exercises for learners to practice the concepts and skills they have learned. Including exercises with solutions would greatly enhance the learning experience.\n\nSuggestions for improvements:\n\nExpand the prerequisites section to include specific resources for learners to review necessary concepts.\nProvide definitions or links to resources for technical terms and jargon.\nProvide more context on why certain functions or packages are used.\nExplain what learners should expect to see in the visualizations and how to interpret them.\nInclude exercises with solutions for learners to practice the concepts and skills they have learned." }, { - "objectID": "posts/simulist_v0.1.0/index.html#new-features", - "href": "posts/simulist_v0.1.0/index.html#new-features", - "title": "simulist v0.1.0", - "section": "New features", - "text": "New features\n\nA set of simulation functions to simulate epidemiological data. These simulation functions contain arguments that allow for the simulation to be flexibly parameterised to be generated under specified epidemiological or demographic conditions.\n\nsim_linelist(): simulate line list data\nsim_contacts(): simulate contacts data\nsim_outbreak(): simulate both line list and contacts data\n\nFour general vignettes\n\nGet Started vignette (simulist.Rmd)\nAge-stratified hospitalisation and death risks vignette (age-strat-risks.Rmd)\nAge structured population vignette (age-struct-pop.Rmd)\nVisualising simulated data (vis-linelist.Rmd)\n\nOne developer focused vignette\n\nDesign Principles for {simulist} (design-principles.Rmd)\n\nUnit tests (100% code coverage) and documentation files.\nContinuous integration workflows for R package checks, rendering the README.md, calculating test coverage, deploying the pkgdown website, updating the citation file, and checking new package or system dependencies." + "objectID": "posts/ai-learner-review/index.html#reviewing-forecast-materials-as-a-field-epidemiologist", + "href": "posts/ai-learner-review/index.html#reviewing-forecast-materials-as-a-field-epidemiologist", + "title": "Using LLM agents to review tutorials ‘in character’ as learners", + "section": "Reviewing forecast materials as a field epidemiologist", + "text": "Reviewing forecast materials as a field epidemiologist\nTo further illustrate review outputs, the below is what was produced when we imported the Lucia field epidemiologist persona alongside an early pre-Alpha version of the training materials for our forecasting tutorial:\n\nPositive Feedback:\n\nThe tutorial is well-structured and follows a logical progression, which makes it easier to follow.\nThe use of real-world examples and datasets (COVID-19 and Ebola) makes the tutorial more relatable and practical.\nThe inclusion of questions, objectives, prerequisites, and key points at the beginning and end of the tutorial respectively, helps to guide the learning process.\nThe tutorial provides a good balance between theory and practice, with clear explanations and practical coding examples.\nThe use of callouts to highlight important information or provide additional context is very helpful.\n\nConstructive Feedback:\n\nThe tutorial assumes a certain level of familiarity with R and statistical concepts, which might be challenging for beginners. It would be helpful to provide more explanations or links to resources for beginners.\nThe code chunks are quite complex and could be broken down into smaller parts with more explanations.\nThe tutorial could benefit from more visualizations to illustrate the concepts and results.\nThe challenge section could provide more guidance or hints to help learners complete the task.\n\nSuggestions for Improvement:\n\nProvide more explanations or links to resources for beginners to learn R and statistical concepts.\nBreak down complex code chunks into smaller parts with more explanations.\nInclude more visualizations to illustrate the concepts and results.\nProvide more guidance or hints in the challenge section to help learners complete the task.\nInclude a glossary of terms or a list of R packages used in the tutorial for quick reference." }, { - "objectID": "posts/epiparameter_v0.3.0/index.html", - "href": "posts/epiparameter_v0.3.0/index.html", - "title": "epiparameter v0.3.0", + "objectID": "posts/ai-learner-review/index.html#overcoming-feedback-bottlenecks", + "href": "posts/ai-learner-review/index.html#overcoming-feedback-bottlenecks", + "title": "Using LLM agents to review tutorials ‘in character’ as learners", + "section": "Overcoming feedback bottlenecks", + "text": "Overcoming feedback bottlenecks\nA challenge with LLMs trained for general use is finding domain-specific tasks where they can add sufficient value beyond existing human input. Tasks like providing early sense checking and tailored feedback, particularly from differing perspectives, therefore has potential to overcome common bottlenecks in developing training materials (e.g. providing initial comments and flagging obvious issues while waiting for more detailed human feedback).\nAs Epiverse-TRACE training materials continue to develop, we plan to explore further uses beyond simple first-pass reviews. For example, LLMs are well suited to synthesising qualitative feedback, increasing the range of insights that can be collected and summarised from learners as we move into beta testing. We also hope to identify opportunities where LLMs can help supplement the learner experience, as demonstrated by emerging tools like RTutor for descriptive plotting functionality in R, which combines generation of code in response to user queries with translation into shiny outputs." + }, + { + "objectID": "posts/simulist_v0.3.0/index.html", + "href": "posts/simulist_v0.3.0/index.html", + "title": "simulist v0.3.0", "section": "", - "text": "We are very excited to announce the release of a new epiparameter version v0.3.0. Here is an automatically generated summary of the changes in this version.\nThe third minor release of the {epiparameter} R package contains a range of updates and improvements to the package. The principal aim of this release is to simplify, clarify and enhance the classes and class methods when working with epidemiological parameters in R.\nThe are a large number of breaking changes in this release, primarily functions and function arguments being renamed or restructured, see the Breaking changes section for an overview." + "text": "We are very excited to announce the release of a new simulist version v0.3.0. Here is an automatically generated summary of the changes in this version.\nThe third release of the simulist R package contains a range of new features, enhancements, documentation and unit tests.\nThe headline changes to the package are:" }, { - "objectID": "posts/epiparameter_v0.3.0/index.html#new-features", - "href": "posts/epiparameter_v0.3.0/index.html#new-features", - "title": "epiparameter v0.3.0", + "objectID": "posts/simulist_v0.3.0/index.html#new-features", + "href": "posts/simulist_v0.3.0/index.html#new-features", + "title": "simulist v0.3.0", "section": "New features", - "text": "New features\n\nThe library of epidemiological parameters has been updated to include 3 new Chikungunya parameter entries. The Mpox parameters previously missing from the Guzzetta et al. entry have been added (#346 & #374).\nA c() method has been added for <epiparameter> and <multi_epiparameter> objects (#368).\nAn aggregate() method has been added for <multi_epiparameter> to enable consensus distributions to be built utilising the mixture distribution class from {distributional} (#388).\nInfrastructure has been added to the package to allow translations of messages/warnings/errors printed to the console. (@Bisaloo, #367).\nconvert_summary_stats_to_params() can now convert from median and dispersion for a lognormal distribution (#378).\nThe data_dictionary.json has been enhanced to improve validation of the library of epidemiological parameters (parameters.json) (#379).\nAn interactive network showing all of the <epiparameter> S3 methods has been added to the design_principles.Rmd vignette (#383).\nThe data_from_epireview.Rmd article has been improved and updated with new changes from {epireview} (@CarmenTamayo & @cm401 & @kellymccain28, #305 & #373).\nParameter units have been added to every entry in the {epiparameter} library (parameters.json) and to the $metadata element in <epiparameter> objects. The create_metadata() function now has a units argument to construct metadata lists (#391).\nImproved database.Rmd vignette by adding a short citation to the reference column (@jamesmbaazam, #348).\nThe family() method for <epiparameter> has been improved to allow access to distribution names for transformed (e.g. mixture and truncated distributions) and untransformed (e.g. gamma and lognormal) distributions with a new argument base_dist and a new internal function .distributional_family() (#398).\nas_epiparameter() can now work with the SARS parameters from {epireview} (#407)." + "text": "New features\n\nExternal\n\nonset_to_hosp and onset_to_death arguments can now take NA as input and will return a column of NAs in the line list columns date_admission and date_outcome (#98).\nAn onset_to_recovery argument has been added to the simulation functions, sim_linelist() and sim_outbreak(), and so the recovery date can be explicitly given in the line list data (#99).\nThe line list simulation can now use a time-varying case fatality risk. The create_config() function now returns a $time_varying_death_risk list element, which is NULL by default but can take an R function to enable the fatality risk of cases to change over the epidemic (#101).\nA new vignette, time-varying-cfr.Rmd, has been added to the package to describe how to use the time-varying case fatality risk functionality and describe a few different time-varying functions that can be used (#101).\nA new vignette, wrangling-linelist.Rmd, has been added to go over some of the common post-processing steps that might be required after simulating line list or contact tracing data. This vignette is short and currently only contains a single post-processing example, more examples will be added over time (#104).\nThe README now has a section on related projects to provide an overview of packages that simulate line list data, or are related to simulist. This section contains a disclosure widget containing a feature table providing a high-level description of the features and development status of each related package (#110).\nA Key features section and Complimentary R packages section has been added to the README (#134).\nUpdated package architecture diagram in the design-principles.Rmd vignette (#113).\n\n\n\nInternal\n\nThe .add_deaths() function has been replaced by the .add_outcome() function which can simulate death and recovery times (#99).\n.cross_check_sim_input() function has been added to the package to ensure user input is coherent across arguments (#98).\n.anonymise() function has been added to convert individual’s names into alphanumeric codes to anonymise individuals in line list and contact tracing data (#106)." }, { - "objectID": "posts/epiparameter_v0.3.0/index.html#breaking-changes", - "href": "posts/epiparameter_v0.3.0/index.html#breaking-changes", - "title": "epiparameter v0.3.0", + "objectID": "posts/simulist_v0.3.0/index.html#breaking-changes", + "href": "posts/simulist_v0.3.0/index.html#breaking-changes", + "title": "simulist v0.3.0", "section": "Breaking changes", - "text": "Breaking changes\n\nThe <epidist> class has been renamed <epiparameter> to avoid confusion with the similar R package {epidist} (#360).\nMany functions that used epidist in their names have been renamed to use epiparameter due to the renaming of the class (#360).\nThe function signatures of epiparameter() and new_epiparameter() functions (previously epidist() and new_epidist()) have been updated to collapse the prob_dist, prob_dist_params, discretise and truncation arguments into prob_distribution, which accepts the output of create_prob_distribution() (#381).\nThe epi_dist argument has been renamed epi_name. This is to clarify that {epiparameter} can work with epidemiological parameters that take a variety of forms (e.g. point estimates, ranges, probability distributions, etc.) (#390).\nThe <vb_epidist> class and it’s methods have been removed from the package. It was not being used and was increasing the complexity and maintenance load of the package (#359).\ncreate_prob_dist() has been renamed to create_prob_distribution() (#381).\nvalidate_epiparameter() (previously validate_epidist()) has been renamed assert_epiparameter(), and test_epiparameter() has been added, with the aim to harmonise design with {contactmatrix} and the messages and errors have been improved (#366 & #402).\nThe minimum version of R required by the package is now 4.1.0 due to the use of the base R pipe (|>) in dependencies, and the R-CMD-check workflow on GitHub actions now explicitly runs on the minimum version of R stated in the DESCRIPTION (#384 & #405)." + "text": "Breaking changes\n\nThe simulation functions are now parameterised with an infectious period (infect_period argument) instead of a contact interval (contact_interval argument). This moves away from parameterising the simulation with the time delay between a person becoming infected and having contact with a susceptible individual, and instead uses an infectious period distribution within which contacts are uniformly distributed in time (#96).\nThe simulation functions can now set a maximum as well as a minimum outbreak size. The min_outbreak_size argument in the exported sim_*() functions has been renamed outbreak_size and takes a numeric vector of two elements, the minimum and maximum outbreak size. The maximum outbreak size is a soft limit due to the stochastic nature of the branching process model, so epidemiological data returned can contain more cases and/or contacts that the maximum in outbreak_size but in these case a warning is returned explaining to the user how many cases/contacts are being returned (#93).\nThe add_ct argument in sim_linelist() and sim_outbreak() has been removed. The functionality is now equivalent to add_ct = TRUE in the previous simulist version. The add_ct argument was removed to move the package to always returning <data.frame>s with the same number of columns, for consistency and predictability (#104).\nThe add_names argument in the simulation functions has been renamed to anonymise. The new argument controls whether names are given to each case (anonymise = FALSE, default behaviour) or whether fixed length hexadecimal codes are given to each case (anonymise = TRUE), this ensures the returned <data.frame> has the same number of columns (#106)." }, { - "objectID": "posts/epiparameter_v0.3.0/index.html#bug-fixes", - "href": "posts/epiparameter_v0.3.0/index.html#bug-fixes", - "title": "epiparameter v0.3.0", + "objectID": "posts/simulist_v0.3.0/index.html#bug-fixes", + "href": "posts/simulist_v0.3.0/index.html#bug-fixes", + "title": "simulist v0.3.0", "section": "Bug fixes", - "text": "Bug fixes\n\nEpidemiological parameter entries in the library stored with lognormal distributions, parameterised as median and dispersion are now converted to meanlog and sdlog correctly when creating an <epiparameter> (when auto_calc_params = TRUE) (#381)." + "text": "Bug fixes\n\n.sim_network_bp() now indexes the time vector correctly. Previously a vector indexing bug meant the epidemic would not progress through time (#95)." }, { - "objectID": "posts/epiparameter_v0.3.0/index.html#deprecated-and-defunct", - "href": "posts/epiparameter_v0.3.0/index.html#deprecated-and-defunct", - "title": "epiparameter v0.3.0", + "objectID": "posts/simulist_v0.3.0/index.html#deprecated-and-defunct", + "href": "posts/simulist_v0.3.0/index.html#deprecated-and-defunct", + "title": "simulist v0.3.0", "section": "Deprecated and defunct", - "text": "Deprecated and defunct\n\nepidist_db() has been deprecated. It is replaced by epiparameter_db() (#360 & #399)." + "text": "Deprecated and defunct\n\nNone" }, { - "objectID": "posts/epiparameter_v0.3.0/index.html#acknowledgements", - "href": "posts/epiparameter_v0.3.0/index.html#acknowledgements", - "title": "epiparameter v0.3.0", + "objectID": "posts/simulist_v0.3.0/index.html#acknowledgements", + "href": "posts/simulist_v0.3.0/index.html#acknowledgements", + "title": "simulist v0.3.0", "section": "Acknowledgements", - "text": "Acknowledgements\n@Bisaloo @CarmenTamayo @adamkucharski @avallecam @chartgerink @jamesmbaazam @joshwlambert" + "text": "Acknowledgements\nMany thanks to the contributors to this release, either from issues, code contributions, reviews or discussions (listed in alphabetical order):\n@adamkucharski, @avallecam, @Bisaloo, @CarmenTamayo, @chartgerink, @jamesmbaazam, @ntncmch, @PaulC91, @pratikunterwegs, and @sbfnk." }, { - "objectID": "posts/bpmodels_v0.3.1/index.html", - "href": "posts/bpmodels_v0.3.1/index.html", - "title": "bpmodels v0.3.1", + "objectID": "posts/cfr_v0.1.0/index.html", + "href": "posts/cfr_v0.1.0/index.html", + "title": "cfr v0.1.0", "section": "", - "text": "We are very excited to announce the release of a new bpmodels version v0.3.1. Here is an automatically generated summary of the changes in this version." + "text": "We are very excited to announce the release of a new cfr version v0.1.0. Here is an automatically generated summary of the changes in this version.\nInitial CRAN submission of cfr, an R package to estimate the severity of a disease and ascertainment of cases while correcting for delays in outcomes of reported cases being known.\nThis release includes:\n\nFunctions for the overall severity of an outbreak, the overall severity of an outbreak estimated with an expanding time series of data, and the time-varying severity of an outbreak,\nA function to estimate the number of outcomes to be expected from a given number of cases assuming a user-specified distribution of delays between cases and outcomes being known,\nA function to estimate the overall (static) ascertainment of cases in an outbreak by comparing the relevant severity measures against a user-specified baseline severity (note that functionality for time-varying ascertainment is expected to be included in future versions, and an older implementation of this functionality was removed just prior to release),\nA data preparation generic with an S3 method for the <incidence2> class from the incidence2 package,\nExample daily case and death data from the 1976 Ebola Virus Disease outbreak as reported in Camacho et al. (2014). https://doi.org/10.1016/j.epidem.2014.09.003,\nExample daily case and death data from the Covid-19 pandemic over the range 2020-01-01 to 2022-12-31 from the 19 countries with over 100,00 deaths over this period, as taken from the covidregionaldata package which is no longer on CRAN,\nVignettes describing how to get started with severity estimation, and more detailed workflows on different kinds of severity estimation,\nA vignette on working with data from the incidence2 package, and a vignette on working with delay distributions,\n100% code coverage,\nWorkflows to render the vignettes and README as a website.\n\n\n\n\nReuseCC BY 4.0CitationBibTeX citation:@online{epiverse-trace_development_team2023,\n author = {Epiverse-TRACE development team, The},\n title = {Cfr V0.1.0},\n date = {2023-11-22},\n url = {https://epiverse-trace.github.io/posts/cfr_v0.1.0/},\n langid = {en}\n}\nFor attribution, please cite this work as:\nEpiverse-TRACE development team, The. 2023. “Cfr V0.1.0.”\nNovember 22, 2023. https://epiverse-trace.github.io/posts/cfr_v0.1.0/." }, { - "objectID": "posts/bpmodels_v0.3.1/index.html#input-validation", - "href": "posts/bpmodels_v0.3.1/index.html#input-validation", - "title": "bpmodels v0.3.1", - "section": "Input validation", - "text": "Input validation\n\nThe following internal functions now have input validation: rborel(), dborel(), complementary_logprob(), and rnbinom_mean_disp()." + "objectID": "posts/parent-class/index.html", + "href": "posts/parent-class/index.html", + "title": "Choosing the Right Parent for R Object Classes", + "section": "", + "text": "I have recently published a series of blog posts on the reasons why one may want to start using object-oriented programming (and more specifically R S3 classes) to improve interoperability with other tools from the ecosystem.\nBut there are still questions I have not addressed directly, even if they may have been implicitly included sometimes: what makes a good object class? What good practices in class & function design can improve interoperability?\nAs you can expect from these questions, this post will present a subjective view on S3 class and method design. I will argue that it is often a good strategy to inherit from existing standards classes, and to leverage this inheritance relationship as much as possible." }, { - "objectID": "posts/bpmodels_v0.3.1/index.html#unit-tests", - "href": "posts/bpmodels_v0.3.1/index.html#unit-tests", - "title": "bpmodels v0.3.1", - "section": "Unit tests", - "text": "Unit tests\n\nCode coverage has been improved with more tests on the following functions: rborel(), dborel(), chain_sim(), rnbinom_mean_disp(), complementary_logprob(), rgen_length(), and rbinom_size()." + "objectID": "posts/parent-class/index.html#inherit-from-standard-classes", + "href": "posts/parent-class/index.html#inherit-from-standard-classes", + "title": "Choosing the Right Parent for R Object Classes", + "section": "Inherit from standard classes", + "text": "Inherit from standard classes\nA unique feature of R is the availability and centrality of data.frames in the base language, whereas you need extra libraries for a similar functionality in most other languages (e.g., pandas in Python).\ndata.frame is one of the first “complex” (in the sense of non-atomic) object most R learners will be exposed to and will develop a familiarity with. A good way to leverage this familiarity is to make your subclass a thin wrapper around data.frames.\nThis means that not only will users be able to get started with your package faster because of this familiarity, but you will also immediately benefit from the huge ecosystem of functions and packages working on data.frames, such as the tidyverse. If you want some examples, this is what collaborators and I did in the linelist, pavo, scoringutils, epichains, and vaccineff R packages.\nIn some cases, the output is too complex to fit into a data.frame. Even in this situation, I would recommend inheriting from existing, well-established, classes for the same two reasons: familiarity and ecosystem. For example, for the serofoi R package, we have made the decision to inherit from stanfit objects, rather than a custom structure." }, { - "objectID": "posts/share-cpp/index.html", - "href": "posts/share-cpp/index.html", - "title": "Sharing the C++ Code of an Rcpp Package", - "section": "", - "text": "Using the {Rcpp} package is the dominant method for linking the usability of R with the speed of C++, and can be used to write R packages that are fast and easy to use for both end-users and developers.\nFrom the point of view of developers, it’s very easy to export R code such as functions and classes from an R(cpp) package, but the guidance in the Rcpp documentation does not detail how to export the C++ code so that it can be shared with your other Rcpp packages.\nAllowing C++ code to be shared can be very beneficial for the same reasons that sharing R code is — packaging code is a reliable way to reuse it.\nSome widely used examples of this practice are the {RcppEigen}, {RcppArmadillo}, {RcppGSL}, and Boost Headers {BH} packages. Indeed, in the Epiverse-TRACE team, {RcppEigen} underpins the {finalsize} and upcoming {epidemics} packages." + "objectID": "posts/parent-class/index.html#rely-on-parent-methods-as-much-as-possible", + "href": "posts/parent-class/index.html#rely-on-parent-methods-as-much-as-possible", + "title": "Choosing the Right Parent for R Object Classes", + "section": "Rely on parent methods as much as possible", + "text": "Rely on parent methods as much as possible\nA follow up recommendation from inheriting from standard classes is to leverage their methods wherever possible.\nOne of the first changes I made when becoming maintainer of the linelist package was to remove the rename.linelist() and select.linelist() methods. Indeed, they were, or could easily be, behaving identically as the parent rename.data.frame() and select.data.frame() methods. Rather than burdening the codebase and maintenance with an extra unnecessary method, it is much simpler and more robust to rely on the well-tested parent method. In fact, the dplyr documentation explicitly recommends only writing methods for a couple of standard functions (including [.subclass() and names<-.subclass()), which will enable the use of parent methods directly, rather than writing custom methods for each dplyr function.\nSimilarly, many developers have the reflex to write a custom print.subclass() method as part of the method implementation. While it may be justified in some cases, it is sometimes unnecessary. My recommendation would be to evaluate carefully what benefits the custom method brings over the default parent method." }, { - "objectID": "posts/share-cpp/index.html#why-share-c-code-from-an-rcpp-package", - "href": "posts/share-cpp/index.html#why-share-c-code-from-an-rcpp-package", - "title": "Sharing the C++ Code of an Rcpp Package", - "section": "", - "text": "Using the {Rcpp} package is the dominant method for linking the usability of R with the speed of C++, and can be used to write R packages that are fast and easy to use for both end-users and developers.\nFrom the point of view of developers, it’s very easy to export R code such as functions and classes from an R(cpp) package, but the guidance in the Rcpp documentation does not detail how to export the C++ code so that it can be shared with your other Rcpp packages.\nAllowing C++ code to be shared can be very beneficial for the same reasons that sharing R code is — packaging code is a reliable way to reuse it.\nSome widely used examples of this practice are the {RcppEigen}, {RcppArmadillo}, {RcppGSL}, and Boost Headers {BH} packages. Indeed, in the Epiverse-TRACE team, {RcppEigen} underpins the {finalsize} and upcoming {epidemics} packages." + "objectID": "posts/parent-class/index.html#enable-conversion-to-standard-classes", + "href": "posts/parent-class/index.html#enable-conversion-to-standard-classes", + "title": "Choosing the Right Parent for R Object Classes", + "section": "Enable conversion to standard classes", + "text": "Enable conversion to standard classes\nIf after careful consideration, extra metadata makes it too difficult to fit your new class into an existing class, you may sometimes have to define your own class from “scratch” (i.e., often list() in R).\nBut even in this case, you can still apply some of the ideas proposed earlier. As much as possible, you should provide helpers or methods to enable the streamlined conversion of your method to a standard class.\nA good example here is the epiparameter package, which provides a complex S3 class built on lists, including extensive metadata about probability distribution of epidemiological parameters. As such, this custom class cannot be used out of the box in most functions from other packages. But an as.function() method is conveniently provided to enable the conversion of this probability distribution parameters into a density distribution, which can then be used in functions which expect a function object." }, { - "objectID": "posts/share-cpp/index.html#two-ways-to-share-c-code", - "href": "posts/share-cpp/index.html#two-ways-to-share-c-code", - "title": "Sharing the C++ Code of an Rcpp Package", - "section": "Two ways to share C++ code?", - "text": "Two ways to share C++ code?\nDevelopers searching for a way to make the C++ code of their Rcpp-based packages shareable will likely find two seemingly distinct ways of doing so.\n\nDevelopers reading the Rcpp Attributes documentation will find that package C++ code can be shared by generating a C++ interface for functions that also have an R interface, using Rcpp attributes.\nDevelopers instead scanning widely used Rcpp packages such as {RcppEigen} will notice that C++ code can also be shared by defining the majority of the C++ functions in a package header, to which other Rcpp packages can be linked.\n\nThese are simply different pathways to the writing and export of an R(cpp) package header, which allows Rcpp to link the package’s C++ code to other packages.\nThis blog post explores different ways of doing this, and explains how the Rcpp packages from Epiverse-TRACE implement C++ code sharing." + "objectID": "posts/parent-class/index.html#conclusion", + "href": "posts/parent-class/index.html#conclusion", + "title": "Choosing the Right Parent for R Object Classes", + "section": "Conclusion", + "text": "Conclusion\nIn summary, I recommend relying on well-established parent classes such as data.frames or at least providing direct conversion functions to these standard classes, and using parent methods wherever possible rather than writing custom dedicated methods. This should help produce a package:\n\nmore easily accessible for users because it uses objects that feel familiar\nmore maintainable because a lot of method writing is offloaded to the parent class\nmore likely to be interoperable because standard classes are a good way to pass data between functions or packages\n\nThanks to Chris Hartgerink, James Azam and Josh Lambert, for their very valuable feedback on this post." }, { - "objectID": "posts/share-cpp/index.html#the-package-header", - "href": "posts/share-cpp/index.html#the-package-header", - "title": "Sharing the C++ Code of an Rcpp Package", - "section": "The package header", - "text": "The package header\nThe package header of the package {mypackage} is a file of the name mypackage.h under inst/include. Defining this header is the key step in making (R)C++ code shareable.\n# conceptual organisation of an Rcpp package with a package header\n.\n├── DESCRIPTION\n├── NAMESPACE\n├── R\n│   └── RcppExports.R\n├── inst\n│   └── include\n│   └── mypackage.h # <= the package header\n└── src\n ├── RcppExports.cpp\n └── rcpp_hello.cpp # <= code from which RcppExports.cpp generates\n\nAutogenerating the package header\nThe package header is autogenerated when the attributes of an Rcpp function are edited to also generate a C++ interface.\nConsider the Rcpp function below which is exposed to R and exported from the package. The line // [[Rcpp::interfaces(cpp)]] instructs Rcpp to autogenerate two header files under inst/include:\n\nA package header, called mypackage.h, and\nA helper header called mypackage_RcppExports.h with ‘automagic’ C++ bindings for the function hello_world_rcpp().\n\n\n\nsrc/rcpp_hello.cpp\n\n#include <Rcpp.h>\n\n// [[Rcpp::interfaces(cpp)]]\n\n//' @title Test Rcpp function\n//' \n//' @export\n//[[Rcpp::export]]\nvoid hello_world_rcpp() {\n Rcpp::Rcout << \"hello world!\\n\";\n}\n\n\n\nManually creating the package header\nThe package header can also be created manually, as mypackage.h under inst/include. In this case, the helper file mypackage_RcppExports.h is not generated.\nExamples of this are the widely used {RcppEigen} and {RcppArmadillo} packages, while this demonstration package by James Balamuta is a minimal example that is a good place to get started to understand how this approach works.\nThe manually defined package header can initially be empty, and is populated by the developer — more on header contents below.\n\n\n\n\n\n\nWarning\n\n\n\nIt is possible to edit an autogenerated package header to include manually created header files in addition to mypackage_RcppExports.h. To do this, remove the generator tag (see below) to prevent this file from being overwritten by Rcpp::compileAttributes(). Then include any extra header files as usual.\nWe would however recommend not autogenerating headers from Rcpp functions, but rather writing a header-heavy package — this is the approach used by {RcppEigen} etc. (see more below on how we organise our packages).\n\n\n\n\nContents of the package header\nWe found it difficult to get information on the content of the package header.\nAutogenerated package headers contain an autogeneration message and a generator token, similar to that present in RcppExports files. Package headers should contain a header include guard.\n\n\n\n\n\n\nTip\n\n\n\nThe style of the header name in the include guard for autogenerated headers is RCPP_mypackage_H_GEN_. Package headers from the Rcpp core team, such as {RcppEigen} and {RcppArmadillo}, are manually defined and follow the convention mypackage__mypackage__h. In examples, such as this bare-bones demonstration package by James Balamuta, you might also encounter a single underscore (_) and a capital H (mypackage_mypackage_H).\nIf you are linting your Rcpp package’s C++ code with Cpplint, all three are incompatible with Cpplint’s preference, which is DIR_SUBDIR_FILE_H. Exclude the package header from linting to avoid this warning if you wish to follow an Rcpp community style instead.\n\n\nThe package header must also link to the code you want to export, and there are at least three ways of doing this.\n\nInclude the autogenerated file mypackage_RcppExports.h; this is already done as part of the package header generation.\nDirectly write C++ code in the package header. This is technically possible, but unlikely to be a good option as your package’s C++ codebase grows.\nManually include any other C++ header files in the package header. This last option might lead to a package header such as that shown below.\n\n\n\ninst/include/mypackage.h\n\n// Manually created package header with manual code inclusion\n#ifndef mypackage_mypackage_H\n#define mypackage_mypackage_H\n\n// include files using paths relative to inst/include\n#include \"header_01.h\"\n#include \"header_02.h\"\n\n#endif // mypackage_mypackage_H\n\nHere, the header files might contain code that you wish to make available to other packages, such as a C++ function, struct, or class, and indeed in the current package as well — more on how to do this below." + "objectID": "posts/chjh-intro-blog/index.html", + "href": "posts/chjh-intro-blog/index.html", + "title": "New team member: Chris Hartgerink", + "section": "", + "text": "In January, I joined Epiverse as part of my new role as Senior Software Engineer at data.org. In the weeks since, I have been able to already meet a lot of members of the Epiverse community. Talking to you and hearing about what drives you, further excites me to be part of this community! In order to build on that and keep the energy flowing, I wanted to introduce myself here more broadly.\nIn this blog post, I share some of my background in R, my interest in open source, and how I look to contribute to this community. I want to already highlight that I love finishing things, from peanut butter jars to pull requests — so if you have anything that you’d like to pull over the finish line, I am more than happy to support in doing that!" }, { - "objectID": "posts/share-cpp/index.html#using-rcpp-in-header-code", - "href": "posts/share-cpp/index.html#using-rcpp-in-header-code", - "title": "Sharing the C++ Code of an Rcpp Package", - "section": "Using Rcpp in header code", - "text": "Using Rcpp in header code\nUsing {Rcpp}’s C++ functionality, such as the Rcpp classes DataFrame or List, or classes and functions of Rcpp-based packages such as {RcppEigen}, is as simple as including those headers in the appropriate location, just as one would in a source file — see the example below.\n\n\ninst/include/header_01.h\n\n// In a manually created header file, say, header_01.h\n// which is included in mypackage.h\n\n// to use Rcpp\n#include <Rcpp.h>\n\n// note the use of inline, more on this later\ninline void hello_world_rcpp() {\n Rcpp::Rcout << \"hello world!\\n\";\n}\n\nThe appropriate headers are automatically included in autogenerated package headers’ helper files, and the developer need not do anything more.\n\n\n\n\n\n\nTip\n\n\n\nDon’t forget to link to {Rcpp} or similar packages to the package under development by adding the package names under Imports, Depends, or LinkingTo as appropriate.\nThis can often be handled by functions in the {usethis} package such as usethis::use_rcpp_eigen(). You might also need to add // [[Rcpp::depends(<package>)]] in your package’s C++ source files, with a suitable package dependency specified.\n\n\nThe same principles apply to using C++ code from this package ({mypackage}) in future packages." + "objectID": "posts/chjh-intro-blog/index.html#background-in-r", + "href": "posts/chjh-intro-blog/index.html#background-in-r", + "title": "New team member: Chris Hartgerink", + "section": "Background in R", + "text": "Background in R\nMy first real introduction to programming was with R back in 2013, during a course on statistical programming. This was more about scripting than engineering, but it was where I dipped my toes. During my PhD at a methodology and statistics department, I ended up becoming the go to person for getting mundane things done faster. I ended up writing R and shell scripts left and right, with most of them ending up being used only once. I wasted a lot of time and had tons of fun doing this! I ended up writing scripts for my research as well, leading to some legal troubles with big publishers (see this old blog post). Oops!\nFrom scripting came contributing to R packages like statcheck, from which came developing my own packages like retractcheck. Before I knew it I was helping others learn R and build their own packages in small workshops. This was back when the devtools experience was pretty decent but in no way as complete as it is today.\nAround 2018, I ended up leaving academia and stopped needing to develop in R. I was not upset because R development is rather quirky and arbitrary at times. Since then, I have designed, built, and maintained production web applications in NodeJS, with my biggest achievement being ResearchEquals. I’m excited to bring the best of different languages together in this new position. Coming from NodeJS engineering, I already see some avenues for building a better developer experience in R. I look forward to not just contribute packages but also more general tooling during my time working on Epiverse." }, { - "objectID": "posts/share-cpp/index.html#using-header-code-in-the-package", - "href": "posts/share-cpp/index.html#using-header-code-in-the-package", - "title": "Sharing the C++ Code of an Rcpp Package", - "section": "Using header code in the package", - "text": "Using header code in the package\nThere are some considerations when seeking to use header code from {mypackage} within {mypackage} itself.\nAny functions defined in the package headers must be inline functions (see the example above). This prevents compilation errors related to multiple definitions.\nC++ source files should include the package header, using #include mypackage.h. Functions, structs, or classes defined in header files will be available from the namespace mypackage, as shown in the example below.\nThe code in header files will usually need to be wrapped in (R)C++ code that is exposed to R to make functions from the headers available in R — see the snippet below.\n\n\nmypackage/src/hello_world.cpp\n\n// #include <Rcpp.h> // include Rcpp if necessary\n#include <mypackage.h> // include package header\n\n// Function exposed to R\n//' @title Rcpp function wrapping a header function\n//'\n//' @export\n// [[Rcpp::export]]\nvoid print_hello_world() {\n mypackage::hello_world_rcpp(); // note the namespacing\n}\n\n\n\n\n\n\n\nTip\n\n\n\nRemember to add PKG_CPPFLAGS += -I../inst/include/ to both Makevars and Makevars.win under src/. Furthermore, as noted in the Rcpp attributes documentation, the package will not automatically cause a rebuild when headers are modified — this needs to be done manually." + "objectID": "posts/chjh-intro-blog/index.html#working-openly", + "href": "posts/chjh-intro-blog/index.html#working-openly", + "title": "New team member: Chris Hartgerink", + "section": "Working openly", + "text": "Working openly\nI come to open source through open science, which is all about reducing the “insiders” and “outsiders” of research. Can people access publications equitably? Can people contribute to the work that’s happening equitably? Are funds distributed equitably? There are still big barriers to accessing research, its institutions, and as a result there remain inequities in the knowledge that is produced.\nAs a natural extension of open science, I started getting involved with open source communities around 2015-2016. I started with smaller open source projects springboarding to bigger projects over time. I documented all my own research projects on GitHub and started building websites (e.g., one about copyright in research publishing). All this practice developed my philosophy of working openly, which culminated in my time as a Mozilla Open Science Fellow in 2017.\nWith over a decade of practice, I approach working openly as community work. Open source is not just making information public - it requires making that information accessible and actionable. It has to be public and participatory. It needs to be informed by the needs of those participating, and participation needs to be encouraged, enabled, and scaffolded. This also means that the community needs to define what it desires. Are we working on a static piece of information that we want to enable reuse of (e.g., a finalized dataset) or do we want to co-create with people who we still need to invite in? People come and go, and so this community culture has to evolve.\nWorking openly, to me, also means that considerate and empathetic communication is key. It requires articulating our own needs, wants, and uncertainty with honesty. It requires accepting that our personal wants are not always aligned with what the community wants. How do we deal with those situations? Do we seek consensus or majorities? These discussions need humility and not getting entitled to things. It means actually listening to each other. There is no room for peacockery or posturing, but lots of room for caring and celebrating each other." }, { - "objectID": "posts/share-cpp/index.html#linking-header-code-between-pacakges", - "href": "posts/share-cpp/index.html#linking-header-code-between-pacakges", - "title": "Sharing the C++ Code of an Rcpp Package", - "section": "Linking header code between pacakges", - "text": "Linking header code between pacakges\nOnce you have developed your package, you can link to its C++ header code in the same way as you would to any other Rcpp-based package.\nConsider the snippet below which shows how to link the C++ code from {mypackage} in a different package called {yourpackage}.\n\n\nyourpackage/src/hello_world.cpp\n\n// [[Rcpp::depends(mypackage)]] /// specify dependency\n#include <mypackage.h>\n\n// Define and export an Rcpp function\nvoid print_linked_hello() {\n mypackage::hello_world_rcpp();\n}\n\nBe sure to add LinkingTo: mypackage in the DESCRIPTION of the second package {yourpackage}." + "objectID": "posts/chjh-intro-blog/index.html#contributing-to-epiverse", + "href": "posts/chjh-intro-blog/index.html#contributing-to-epiverse", + "title": "New team member: Chris Hartgerink", + "section": "Contributing to Epiverse", + "text": "Contributing to Epiverse\nI am excited to bring this philosophy of working open to Epiverse and vice versa, learn how this community is open in its own unique way. With an international community across three continents there are important questions around what it means to be participatory. After talking to various community members, it is clear this community raises important and practical questions. I am also intrigued to see what is preventing people from joining this community - what barriers are we unconsciously putting up?\nI see my role within Epiverse as a “rush goalie” (also known as a fly goalie). I will own the development and maintainance of specific pieces of software, yet I can leave those to actively support work throughout Epiverse. In short: I go where I am needed, in whatever form that may be. That can be software development, community engagement, and so much more.\nTo that end: Get in touch with me anytime you think I might be of support. Tag me on GitHub liberally (@chartgerink). Send me emails on chris@data.org. DM me wherever. I am here to support Epiverse and all its various packages in whatever shape they may need. This includes code reviews, making fixes, and being part of discussions. I would rather be tagged once too often than once too little.\nEpiverse is also not a standalone community within the R space, so when practical, I will also contribute back to the packages that we use in our work. I already started making pull requests elsewhere, and it is great to see the R community be so welcoming to outside contributions. My hope is that Epiverse will be just as welcoming and a part of that wider ecosystem.\n\nAt the moment, I am still finding my way throughout the various repositories. It takes time to understand new code and start identifying areas to contribute (and it’s not always as easy as can be). I will be checking in on the repositories (and there are a lot!) and may ask around to see whether PRs or issues are still relevant. Nonetheless, after several weeks, I am starting to feel more like a part of the community than a newbie.\nThanks for reading along and feel free to leave a comment if you have anything else you’d like to know.\n\n\n\n\n\n\nGetting in touch\n\n\n\nYou can reach me on chris@data.org or on GitHub under @chartgerink. I am also on Mastodon under @chartgerink@akademienl.social." }, { - "objectID": "posts/share-cpp/index.html#c-code-sharing-in-epiverse-trace", - "href": "posts/share-cpp/index.html#c-code-sharing-in-epiverse-trace", - "title": "Sharing the C++ Code of an Rcpp Package", - "section": "C++ code sharing in Epiverse-TRACE", - "text": "C++ code sharing in Epiverse-TRACE\nIn Epiverse-TRACE, we have structured the {finalsize} and {epidemics} packages to have manually created headers, following the principles laid out above. We follow some additional principles as well.\n\nHeader-heavy packages\n\nOur packages are header-heavy, so that most of the actual code is defined in the headers. The source files are primarily intended to contain wrappers that expose the header code to R (and our users).\n\nNamespaces to organise header code\n\nOur header code is organised into C++ namespaces, which makes it easier to understand where functions are likely to be defined, and what they might be related to. It also makes it possible to include the package headers (and namespaces) that are relevant to users, rather than including the entire codebase.\n\n\nAs an example, functions related to non-pharmaceutical interventions or vaccination regimes from the {epidemics} package can be used in other packages without also including the compartmental epidemic models contained therein." + "objectID": "posts/epichains_v0.1.0/index.html", + "href": "posts/epichains_v0.1.0/index.html", + "title": "epichains v0.1.0", + "section": "", + "text": "We are very excited to announce the release of a new epichains version v0.1.0. Here is an automatically generated summary of the changes in this version." }, { - "objectID": "posts/share-cpp/index.html#ensuring-the-quality-of-header-code", - "href": "posts/share-cpp/index.html#ensuring-the-quality-of-header-code", - "title": "Sharing the C++ Code of an Rcpp Package", - "section": "Ensuring the quality of header code", - "text": "Ensuring the quality of header code\nYou can lint and statically check code in a package header using tools for linting C++ code such as Cpplint and Cppcheck. When doing so, it may be important to specify minimum C++ standards, or even the language (C or C++) to avoid linter errors. This is because tools — such as Cppcheck — assume that headers with the extension .h are C headers, which throws errors when encountering C++ features such as the use of namespaces.\nCppcheck’s language and C++ standard can be set using:\ncppcheck --std=c++14 --language=c++ --enable=warning,style --error-exitcode=1 inst/include/*.h\nFurthermore, header code can also be tested independently of the R(cpp) code that eventually wraps it. This can be done using the Catch2 testing framework, which is conveniently available using {testthat} — this is an extensive topic for another post." + "objectID": "posts/epichains_v0.1.0/index.html#epichains-0.1.0", + "href": "posts/epichains_v0.1.0/index.html#epichains-0.1.0", + "title": "epichains v0.1.0", + "section": "epichains 0.1.0", + "text": "epichains 0.1.0\nWe are excited to announce the first minor release of {epichains}.\n{epichains} re-implements {bpmodels}, focusing on a unified simulation framework using branching processes to simulate transmission chains data. The framework incorporates susceptible depletion and pre-existing immunity and provides dedicated data structures for handling and analysing transmission chains in both tabular and vector formats. The goal is to provide seamless interoperability with other packages within the Epiverse-TRACE Initiative and the broader epidemiological tool ecosystem." }, { - "objectID": "posts/share-cpp/index.html#conclusion", - "href": "posts/share-cpp/index.html#conclusion", - "title": "Sharing the C++ Code of an Rcpp Package", - "section": "Conclusion", - "text": "Conclusion\nDeveloping an Rcpp-based package with C++ code sharing in mind takes some organisation, or even reorganisation, of the C++ codebase. It is probably a good idea to consider whether your package will implement code that would be of interest to other developers, or to you in related projects. If either of these is true, it may help to structure your package with C++ code sharing in mind from the very beginning of development. This can substantially reduce development overheads and mistakes associated with maintaining multiple copies of the same or similar code in different projects. Fortunately, some great examples of how to do this are among the most-used Rcpp-based packages, providing both a conceptual template to consult for your work, as well as being a demonstration of how beneficial this practice can be in the long run. In Epiverse-TRACE, we intend to continue developing with C++ code sharing as a core principle so that we and other developers can build on our initial work." + "objectID": "posts/epichains_v0.1.0/index.html#new-features", + "href": "posts/epichains_v0.1.0/index.html#new-features", + "title": "epichains v0.1.0", + "section": "New Features", + "text": "New Features\n\nDocumentation\n\nDedicated Website: Explore all features and documentation on the epichains website.\nHelp: Each function comes with extensive documentation. We welcome your feedback and suggestions for improvements.\nVignettes: This release comes with five detailed vignettes:\n\nGetting Started: A quick guide to the key functions.\nModelling Disease Control Interventions: Learn how to model various intervention strategies.\nProjecting Infectious Disease Incidence: A case study on projecting COVID-19 incidence.\nLiterature: A curation of literature on branching process applications in epidemiology.\nTheoretical Background: A deep dive into the theoretical background of the functions in the package (Contributor documentation).\nDesign principles: The design principles of {epichains} (Contributor documentation).\n\n\n\n\nSimulation\n\nsimulate_chains(): Simulate independent transmission chains from a specified number of initial cases, incorporating susceptible depletion and pre-existing immunity.\nsimulate_chain_stats(): Generate a vector of chain sizes or lengths from a specified number of initial cases, incorporating susceptible depletion and pre-existing immunity.\n\n\n\nInference\n\nlikelihood(): Estimate the (log)likelihood of transmission chain sizes or lengths, with support for numeric vectors or <epichains> and <epichains_summary> objects.\n\n\n\nTransmission Chain Data Manipulation\n\nsummary(): Extract vectors of chain sizes or lengths from <epichains> objects.\naggregate(): Generate case time series by aggregating by generation or time of infection.\nplot(): Visualize individual transmission chains filtered by their id." }, { - "objectID": "posts/renv-complications/index.html", - "href": "posts/renv-complications/index.html", - "title": "Things that can go wrong when using renv", + "objectID": "posts/linelist_v1.1.0/index.html", + "href": "posts/linelist_v1.1.0/index.html", + "title": "linelist v1.1.0", "section": "", - "text": "Throughout the Epiverse project, we use the renv R package to ensure reproducibility of the training materials and the pipelines we are providing. But we sometimes get reports from users who struggle to rebuild the environment and run the code.\nIn this post, we dissect the source of these issues, explain why in reality renv is not at fault, and how this is caused by the inherent complexity of reproducibility. The renv documentation already includes caveats explaining why some situations are bound to require more complex tools. This blog post reiterates some of these caveats and illustrates them with concrete examples.\nFinally, we mention a couple of more complete (but more complex!) frameworks that can overcome the issues presented here. We do not explore these alternative framework in detail but provide links to more information." + "text": "We are very excited to announce the release of a new linelist version v1.1.0. Here is an automatically generated summary of the changes in this version." }, { - "objectID": "posts/renv-complications/index.html#binaries-vs-building-from-source", - "href": "posts/renv-complications/index.html#binaries-vs-building-from-source", - "title": "Things that can go wrong when using renv", - "section": "Binaries vs building from source", - "text": "Binaries vs building from source\nSoftware, including R packages, can generally be delivered in two forms: as binaries or as source code. If you are building from the source code, you may in some case need a compilation toolchain on your computer. If that toolchain is missing, it can lead to errors such as:\n\nld: warning: search path '/opt/gfortran/lib' not found\nld: library 'gfortran' not found\n\nMost of the time, regular users of R will not see these errors because they are installing binaries. Indeed, CRAN provides pre-compiled binaries for Windows and macOS for the last version of the package and R.\nWith renv, you often want to install older versions of the packages, which won’t be available as binaries from CRAN. This means you are more likely to have to compile the package yourself and see this kind of errors, even though renv is not causing them.\n\n\n\n\n\n\ngfortran issues on Apple Silicon computers\n\n\n\nIf you are an Apple Silicon (Mac M1, M2, M3) user and encounter issues with gfortran, we have had success using the macrtools R package and we strongly recommend checking it out." + "objectID": "posts/linelist_v1.1.0/index.html#breaking-changes", + "href": "posts/linelist_v1.1.0/index.html#breaking-changes", + "title": "linelist v1.1.0", + "section": "Breaking changes", + "text": "Breaking changes\n\nmake_linelist() and set_tags() no longer accept a named list of characters as input. Instead, make_linelist(), set_tags() and tags_types() now use rlang’s dynamic dots, which means that you can splice list arguments. This implementation is more robust, better tested, and makes it explicit that users want to splice the list (@Bisaloo, #96).\n\nBefore:\n\nmy_tags <- list(\n id = \"case_ID\",\n date_onset = \"date_of_prodrome\",\n age = \"age\",\n gender = \"gender\"\n)\nmake_linelist(obj, my_tags)\n# OR\nmake_linelist(\n obj,\n id = \"case_ID\",\n date_onset = \"date_of_prodrome\",\n age = \"age\",\n gender = \"gender\"\n)\n\nNow:\n\nmy_tags <- list(\n id = \"case_ID\",\n date_onset = \"date_of_prodrome\",\n age = \"age\",\n gender = \"gender\"\n)\nmake_linelist(obj, !!!my_tags)\n# OR\nmake_linelist(\n obj,\n id = \"case_ID\",\n date_onset = \"date_of_prodrome\",\n age = \"age\",\n gender = \"gender\"\n)" }, { - "objectID": "posts/renv-complications/index.html#beyond-renv-scope-incompatibility-with-system-dependency-versions", - "href": "posts/renv-complications/index.html#beyond-renv-scope-incompatibility-with-system-dependency-versions", - "title": "Things that can go wrong when using renv", - "section": "Beyond renv scope: incompatibility with system dependency versions", - "text": "Beyond renv scope: incompatibility with system dependency versions\nWe discussed previously the topic of system dependencies, and dependencies on specific R versions. These special dependencies can also be a source of headaches when using renv.\nThe heart of the issue is that renv provides a simplified solution to reproducibility: it focuses on R packages and their versions. But other sources of non-reproducibility are outside its scope. In many cases, this will not be a problem, as the main source of non-reproducibility, especially in the relatively short-term, will be R package versions.\nBut sometimes, it is possible that the renv.lock lockfile requires such an old version of an R package that it was written with a syntax that is no longer supported by recent R versions or modern compilers.\nFor example, a recent project (from 2023) was trying to install the version 0.60.1 of the matrixStats package (from 2021). This lead to this compilation error:\n\nerror: ‘DOUBLE_XMAX’ undeclared (first use in this function); did you mean ‘DBL_MAX’?\n\n\n\n\nClick to see the full error message\n\n! Error installing package 'matrixStats':\n=======================================\n\n* installing *source* package ‘matrixStats’ ...\n** package ‘matrixStats’ successfully unpacked and MD5 sums checked\n** using staged installation\n** libs\nusing C compiler: ‘gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0’\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c 000.init.c -o 000.init.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c allocMatrix2.c -o allocMatrix2.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c anyMissing.c -o anyMissing.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c binCounts.c -o binCounts.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c binMeans.c -o binMeans.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c colCounts.c -o colCounts.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c colOrderStats.c -o colOrderStats.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c colRanges.c -o colRanges.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c diff2.c -o diff2.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c indexByRow.c -o indexByRow.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c logSumExp.c -o logSumExp.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c mean2.c -o mean2.o\nIn file included from mean2_lowlevel.h:14,\n from mean2.c:9:\nmean2_lowlevel_template.h: In function ‘mean2_int’:\nmean2_lowlevel_template.h:59:13: error: ‘DOUBLE_XMAX’ undeclared (first use in this function); did you mean ‘DBL_MAX’?\n 59 | if (sum > DOUBLE_XMAX) {\n | ^~~~~~~~~~~\n | DBL_MAX\nmean2_lowlevel_template.h:59:13: note: each undeclared identifier is reported only once for each function it appears in\nIn file included from mean2_lowlevel.h:18,\n from mean2.c:9:\nmean2_lowlevel_template.h: In function ‘mean2_dbl’:\nmean2_lowlevel_template.h:59:13: error: ‘DOUBLE_XMAX’ undeclared (first use in this function); did you mean ‘DBL_MAX’?\n 59 | if (sum > DOUBLE_XMAX) {\n | ^~~~~~~~~~~\n | DBL_MAX\nmake: *** [/usr/lib/R/etc/Makeconf:191: mean2.o] Error 1\nERROR: compilation failed for package ‘matrixStats’\n\nThe explanation for this error can be found in the matrixStats release notes, specifically the section for matrixStats 0.63.0:\n\n\nUpdated native code to use the C99 constant DBL_MAX instead of legacy S constant DOUBLE_XMAX, which is planned to be unsupported in R (>= 4.2.0)." - }, + "objectID": "posts/linelist_v1.1.0/index.html#new-features", + "href": "posts/linelist_v1.1.0/index.html#new-features", + "title": "linelist v1.1.0", + "section": "New features", + "text": "New features\n\nlinelist warnings and errors in the case of a tag loss now have a custom class (@Bisaloo, #109), which means it is easier to silence them specifically, or to catch them programmatically for advanced error handling. One example of a new advanced condition handling that was before not possible is:\nwarning_counter <- 0\n\nwithCallingHandlers({\n x <- linelist::make_linelist(cars, date_onset = \"dist\", age = \"speed\")\n x <- x[, -1]\n x <- x[, -1]\n warning(\"This is not a linelist warning\", call. = FALSE)\n}, linelist_warning = function(w) {\n warning_counter <<- warning_counter + 1\n})\n#> Warning: The following tags have lost their variable:\n#> age:speed\n#> Warning: The following tags have lost their variable:\n#> date_onset:dist\n#> Warning: This is not a linelist warning\n\nwarning(\"This pipeline generated \", warning_counter, \" linelist warnings.\")\n#> Warning: This pipeline generated 2 linelist warnings.\nlinelist objects now have a new custom $<-.linelist() to prevent tag loss when subsetting a linelist object (@Bisaloo, #86). This completes the functionality already provided by the [<-.linelist() and [[<-.linelist() methods.\nx$tagged_column <- NULL\n#> Warning in prune_tags(out, lost_action): The following tags have lost their variable:\n#> tag:tagged_column\nValidation failures in validate_types() now integrate a delayed error mechanism (@Bisaloo, #106). This ensures that the error message will return all the invalid tag types at once rather than having to go through multiple trials and runs.\n\nBefore: only the first invalid tag type is returned.\n\n# No warning about age, even though it also has an invalid type\nx <- make_linelist(cars, age = \"speed\", gender = \"dist\")\nvalidate_types(x, ref_types = tags_types(age = \"factor\"))\n#> Error in validate_types(x, ref_types = tags_types(age = \"factor\")) : \n#> Issue when checking class of tag `gender`:\n#> Must inherit from class 'character'/'factor', but has class 'numeric'\n\nNow: the error message returns all the invalid tag types at once.\n\nx <- make_linelist(cars, age = \"speed\", gender = \"dist\")\nvalidate_types(x, ref_types = tags_types(age = \"factor\"))\n#> Some tags have the wrong class:\n#> - gender: Must inherit from class 'character'/'factor', but has class 'numeric'\n#> - age: Must inherit from class 'factor', but has class 'numeric'" + }, { - "objectID": "posts/renv-complications/index.html#some-solutions", - "href": "posts/renv-complications/index.html#some-solutions", - "title": "Things that can go wrong when using renv", - "section": "Some solutions", - "text": "Some solutions\n\nAlternative package managers\nWe discussed how many issues when using renv can arise during the package compilation from source. A potential solution would be to avoid this compilation step and always install pre-compiled binaries.\nThis is not possible while installing from CRAN as CRAN only provides binaries for recent versions of R and for a limited number of platforms.\nBut Posit for example provides a larger collection of binaries, for different package versions, and different platforms, via their Public Posit Package Manager (PPM).\nMaking sure you install from PPM rather than CRAN can be a first simple step to make some of the issues discussed here vanish.\n\n\nExtending the scope of reproducibility\nAnother solution could be to add more complex reproducibility solutions that go beyond the scope of renv.\n\nrenv with rig\nThe R version is specified in renv.lock and to avoid incompatibility of older package versions with newer versions of R, you could run the declared R version. This can be achieved with various means but a convenient solution is the rig tool.\nThere are even some discussions to integrate rig and renv more tightly and let rig detect automatically which R version to use based on the renv.lock file.\n\n\nDocker, Nix and others\nAlternatively, you could use other reproducibility toolkits that focus not just on the R package versions, but on the entire software stack (e.g., including the operating system, the system dependencies). These solutions can be more complex to set up and use, and we won’t detail them in this blog post but you can find more information in:\n\nThe “Using renv with Docker” renv vignette\nthe “An Introduction to Rocker: Docker Containers for R” paper\nBruno Rodrigues’ entire series of blog posts on Nix\n\n\n\n\nConclusion: a final note for developers\nrenv is an elegant solution that focuses on the most immediate source of non-reproducibility. This however means it needs to be complemented by other tools in more complex cases.\nUltimately, reproducibility is a team effort. People who write code can minimise the risk of renv complications by keeping the packages they use close to their CRAN version and regularly updating their code and renv.lock accordingly. Other programming languages have automated tooling to help with this, via, e.g., the dependabot tool which submits pull requests to update dependencies. There is no well established equivalent for R yet, but anyone willing to set this mechanism up can look at the code used by the Carpentries workbench for this task.\nThanks to Pratik Gupte and Chris Hartgerink for their valuable comments on earlier drafts of this post." + "objectID": "posts/linelist_v1.1.0/index.html#internal-changes", + "href": "posts/linelist_v1.1.0/index.html#internal-changes", + "title": "linelist v1.1.0", + "section": "Internal changes", + "text": "Internal changes\n\nInternal duplication in the specification of the tags supported by linelist by default has been removed. This makes it easier to add or remove tags in the future, and reduces the risk of inconsistencies between the different parts of the package (@Bisaloo, #111).\nThe internal tag_variable() function has been replace by a vectorized alternative tag_variable, thus improving performance in make_linelist() and set_tags() about twofold. The error message when tags are specified by position with a number larger than the number of columns in the dataset to tag has also been clarified (@Bisaloo, #110)." }, { - "objectID": "posts/epiparameter_v0.2.0/index.html", - "href": "posts/epiparameter_v0.2.0/index.html", - "title": "epiparameter v0.2.0", + "objectID": "posts/linelist_v1.1.0/index.html#documentation", + "href": "posts/linelist_v1.1.0/index.html#documentation", + "title": "linelist v1.1.0", + "section": "Documentation", + "text": "Documentation\n\nlinelist now provides a design vignette for future contributors or maintainers (@Bisaloo, #112)." + }, + { + "objectID": "posts/system-dependencies/index.html", + "href": "posts/system-dependencies/index.html", + "title": "System Dependencies in R Packages & Automatic Testing", "section": "", - "text": "We are very excited to announce the release of a new epiparameter version v0.2.0. Here is an automatically generated summary of the changes in this version.\nThe second release of the {epiparameter} R package focuses on interoperability with the {epireview} R package. Several functions have been refactored and enhanced.\nThis release benefited from feedback from the participants of an EpiParameter Community workshop hosted by the World Health Organisation." + "text": "This post has been cross-posted on the R-hub blog, and the R-hub blog maintainers have contributed to the review and improvement of this post.\nIn a previous R-hub blog post, we discussed a package dependency that goes slightly beyond the normal R package ecosystem dependency: R itself. Today, we step even further and discuss dependencies outside of R: system dependencies. This happens when packages rely on external software, such as how R packages integrating CUDA GPU computation in R require the CUDA library. In particular, we are going to talk about system dependencies in the context of automated testing: is there anything extra to do when setting continuous integration for your package with system dependencies? In particular, we will focus with the integration with GitHub Actions. How does it work behind the scenes? And how to work with edge cases?" }, { - "objectID": "posts/epiparameter_v0.2.0/index.html#new-features", - "href": "posts/epiparameter_v0.2.0/index.html#new-features", - "title": "epiparameter v0.2.0", + "objectID": "posts/system-dependencies/index.html#introduction-specifying-system-dependencies-in-r-packages", + "href": "posts/system-dependencies/index.html#introduction-specifying-system-dependencies-in-r-packages", + "title": "System Dependencies in R Packages & Automatic Testing", + "section": "Introduction: specifying system dependencies in R packages", + "text": "Introduction: specifying system dependencies in R packages\nBefore jumping right into the topic of continuous integration, let’s take a moment to introduce, or remind you, how system dependencies are specified in R packages.\nThe official ‘Writing R Extensions’ guide states 1:\n\nDependencies external to the R system should be listed in the ‘SystemRequirements’ field, possibly amplified in a separate README file.\n\nThis was initially purely designed for humans. No system within R itself makes use of it. One important thing to note is that this field contains free text :scream:. As such, to refer to the same piece of software, you could write either one of the following in the package DESCRIPTION:\nSystemRequirements: ExternalSoftware\nSystemRequirements: ExternalSoftware 0.1\nSystemRequirements: lib-externalsoftware\nHowever, it is probably good practice check what other R packages with similar system dependencies are writing in SystemRequirements, to facilitate the automated identification process we describe below." + }, + { + "objectID": "posts/system-dependencies/index.html#the-general-case-everything-works-automagically", + "href": "posts/system-dependencies/index.html#the-general-case-everything-works-automagically", + "title": "System Dependencies in R Packages & Automatic Testing", + "section": "The general case: everything works automagically", + "text": "The general case: everything works automagically\nIf while reading the previous section, you could already sense the problems linked to the fact SystemRequirements is a free-text field, fret not! In the very large majority of cases, setting up continuous integration in an R package with system dependencies is exactly the same as with any other R package.\nUsing, as often, the supercharged usethis package, you can automatically create the relevant GitHub Actions workflow file in your project 2:\n\nusethis::use_github_action(\"check-standard\")\n\nThe result is:\n# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples\n# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help\non:\n push:\n branches: [main, master]\n pull_request:\n branches: [main, master]\n\nname: R-CMD-check\n\njobs:\n R-CMD-check:\n runs-on: ${{ matrix.config.os }}\n\n name: ${{ matrix.config.os }} (${{ matrix.config.r }})\n\n strategy:\n fail-fast: false\n matrix:\n config:\n - {os: macos-latest, r: 'release'}\n - {os: windows-latest, r: 'release'}\n - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}\n - {os: ubuntu-latest, r: 'release'}\n - {os: ubuntu-latest, r: 'oldrel-1'}\n\n env:\n GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}\n R_KEEP_PKG_SOURCE: yes\n\n steps:\n - uses: actions/checkout@v3\n\n - uses: r-lib/actions/setup-pandoc@v2\n\n - uses: r-lib/actions/setup-r@v2\n with:\n r-version: ${{ matrix.config.r }}\n http-user-agent: ${{ matrix.config.http-user-agent }}\n use-public-rspm: true\n\n - uses: r-lib/actions/setup-r-dependencies@v2\n with:\n extra-packages: any::rcmdcheck\n needs: check\n\n - uses: r-lib/actions/check-r-package@v2\n with:\n upload-snapshots: true\nYou may notice there is no explicit mention of system dependencies in this file. Yet, if we use this workflow in an R package with system dependencies, everything will work out-of-the-box in most cases. So, when are system dependencies installed? And how the workflow does even know which dependencies to install since the SystemRequirements is free text that may not correspond to the exact name of a library?\nThe magic happens in the r-lib/actions/setup-r-dependencies step. If you want to learn about it, you can read the source code of this step. It is mostly written in R but it contains a lot of bells and whistles to handle messaging within the GitHub Actions context and as such, it would be too long to go through it line by line in this post. However, at a glance, you can notice many mentions of the pak R package.\nIf it’s the first time you’re hearing about the pak package, we strongly recommend we go through the list of the most important pak features. It is paked packed with many very powerful features. The specific feature we’re interested in here is the automatic install of system dependencies via pak::pkg_sysreqs(), which in turn uses pkgdepends::sysreqs_install_plan().\nWe now understand more precisely where the magic happens but it still doesn’t explain how pak is able to know which precise piece of software to install from the free text SystemRequirements field. As often when you want to increase your understanding, it is helpful to read the source. While browsing pkgdepends source code, we see a call to https://github.com/r-hub/r-system-requirements.\nThis repository contains a set of rules as json files which match unformatted software name via regular expressions to the exact libraries for each major operating system. Let’s walk through an example together:\n{\n \"patterns\": [\"\\\\bnvcc\\\\b\", \"\\\\bcuda\\\\b\"],\n \"dependencies\": [\n {\n \"packages\": [\"nvidia-cuda-dev\"],\n \"constraints\": [\n {\n \"os\": \"linux\",\n \"distribution\": \"ubuntu\"\n }\n ]\n }\n ]\n}\nThe regular expression tells that each time a package lists something as SystemRequirements with the word “nvcc” or “cuda”, the corresponding Ubuntu library to install is nvidia-cuda-dev.\nThis interaction between r-system-requirements and pak is also documented in pak’s dev version, with extra information about how the SystemRequirements field is extracted in different situations: https://pak.r-lib.org/dev/reference/sysreqs.html#how-it-works" + }, + { + "objectID": "posts/system-dependencies/index.html#when-its-not-working-out-of-the-box", + "href": "posts/system-dependencies/index.html#when-its-not-working-out-of-the-box", + "title": "System Dependencies in R Packages & Automatic Testing", + "section": "When it’s not working out-of-the-box", + "text": "When it’s not working out-of-the-box\nWe are now realizing that this automagical setup we didn’t pay so much attention to until now actually requires a very heavy machinery under the hood. And it happens, very rarely, that this complex machinery is not able to handle your specific use case. But it doesn’t mean that you cannot use continuous integration in your package. It means that some extra steps might be required to do so. Let’s review these possible solutions together in order of complexity.\n\nFix it for everybody by submitting a pull request\nOne first option might be that the regular expression used by r-system-requirements to convert the free text in SystemRequirements to a library distributed by your operating system does not recognize what is in SystemRequirements.\nTo identify if this is the case, you need to find the file containing the specific rule for the system dependency of interest in r-system-requirements, and test the regular expression on the contents of SystemRequirements.\nIf we re-use the cuda example from the previous section and we are wondering why it is not automatically installed for a package specifying “cudaa”:\n\nstringr::str_match(\"cudaa\", c(\"\\\\bnvcc\\\\b\", \"\\\\bcuda\\\\b\"))\n\n [,1]\n[1,] NA \n[2,] NA \n\n\nThis test confirms that the SystemRequirements field contents are not recognized by the regular expression. Depending on the case, the best course of action might be to:\n\neither edit the contents of SystemRequirements so that it’s picked up by the regular expression\nor submit a pull request to rstudio/r-system-requirements 3 if you believe the regular expression is too restrictive and should be updated (example)\n\nNote however that the first option is likely always the simplest as it doesn’t impact all the rest of the ecosystem (which is why r-system-requirements maintainers might be reluctant to relax a regular expression) and it is often something directly in your control, rather than a third-party who might not immediately be available to review your PR.\n\n\nInstall system dependencies “manually”\nHowever, you might be in a case where you cannot rely on the automated approach. For example, maybe the system dependency to install is not provided by package managers at all. Typically, if you had to compile or install it manually on your local computer, you’re very likely to have to do the same operation in GitHub Actions. There two different, but somewhat equivalent, ways to do so, as detailed below.\n\nDirectly in the GitHub Actions workflow\nYou can insert the installation steps you used locally in the GitHub Actions workflow file. So, instead of having the usual structure, you have an extra step “Install extra system dependencies manually” that may look something like this:\njobs:\n R-CMD-check:\n runs-on: ubuntu-latest\n env:\n GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}\n R_KEEP_PKG_SOURCE: yes\n steps:\n - uses: actions/checkout@v3\n\n - uses: r-lib/actions/setup-r@v2\n with:\n use-public-rspm: true\n\n+ - name: Install extra system dependencies manually\n+ run:\n+ wget ...\n+ make\n+ sudo make install\n\n - uses: r-lib/actions/setup-r-dependencies@v2\n with:\n extra-packages: any::rcmdcheck\n needs: check\n\n - uses: r-lib/actions/check-r-package@v2\nYou can see a real-life example in the rbi R package.\n\n\nUsing a Docker image in GitHub Actions\nAlternatively, you can do the manual installation in a Docker image and use this image in your GitHub Actions workflow. This is a particularly good solution if there is already a public Docker image or you already wrote a DOCKERFILE for your own local development purposes. If you use a public image, you can follow the steps in the official documentation to integrate it to your GitHub Actions job. If you use a DOCKERFILE, you can follow the answers to this stackoverflow question (in a nutshell, use docker compose in your job or publish the image first and then follow the official documentation).\njobs:\n R-CMD-check:\n runs-on: ubuntu-latest\n+ container: ghcr.io/org/repo:main\n env:\n GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}\n R_KEEP_PKG_SOURCE: yes\n steps:\n - uses: actions/checkout@v3\n\n - uses: r-lib/actions/setup-r@v2\n with:\n use-public-rspm: true\n\n - uses: r-lib/actions/setup-r-dependencies@v2\n with:\n extra-packages: any::rcmdcheck\n needs: check\n\n - uses: r-lib/actions/check-r-package@v2\nYou can again see a real-life example in the rbi R package." + }, + { + "objectID": "posts/system-dependencies/index.html#conclusion", + "href": "posts/system-dependencies/index.html#conclusion", + "title": "System Dependencies in R Packages & Automatic Testing", + "section": "Conclusion", + "text": "Conclusion\nIn this post, we have provided an overview of how to specify system requirements for R package, how this seemingly innocent task requires a very complex infrastructure so that it can be understood by automated tools and that your dependencies are smoothly installed in a single command. We also gave some pointers on what to do if you’re in one of the rare cases where the automated tools don’t or can’t work.\nOne final note on this topic is that there might be a move from CRAN to start requiring more standardization in the SystemRequirements field. One R package developer has reported being asked to change “Java JRE 8 or higher” to “Java (>= 8)”.\nMany thanks to Maëlle Salmon & Gábor Csárdi for their insights into this topic and their valuable feedback on this post." + }, + { + "objectID": "posts/system-dependencies/index.html#footnotes", + "href": "posts/system-dependencies/index.html#footnotes", + "title": "System Dependencies in R Packages & Automatic Testing", + "section": "Footnotes", + "text": "Footnotes\n\n\nFor R history fans, this has been the case since R 1.7.0, released in April 2003.↩︎\nAlternatively, if you’re not using usethis, you can manually copy-paste the relevant GitHub Actions workflow file from the examples of the r-lib/actions project.↩︎\nIf you are wondering why we are saying to submit PR to rstudio/r-system-requirements when we were previously talking about r-hub/r-system-requirements, you can check out this comment thread.↩︎" + }, + { + "objectID": "posts/mpox-preparedness/index.html", + "href": "posts/mpox-preparedness/index.html", + "title": "2024 mpox outbreak: common analytics tasks and available R tools", + "section": "", + "text": "There are ongoing outbreaks of mpox globally. The Democratic Republic of Congo (DRC) is so far the worst hit with a total of 7,851 cases and 384 deaths reported between January 1 and May 26, 2024 1. Before 2022, there were few reports of sustained mpox transmission globally. However, during the following year (Jan 1, 2022, and Jan 29, 2023), 110 countries in all six WHO Regions had reported a total of 85,473 confirmed cases and 89 deaths between them (Laurenson-Schafer et al. 2023).\nMpox is transmitted through respiratory droplets and direct contact with infected persons. The disease is characterized by fever, cough, and a rash, with the mean incubation period estimated to be about 7.8 days (Ward et al. 2022). Infected individuals may experience severe symptoms leading to hospitalisation or death. There are two genetic clades: clade I and clade II, which also has subclades IIa and IIb (Laurenson-Schafer et al. 2023).\nSeveral analyses of the potential impact of outbreaks at country level have already emerged in 2024. The US CDC, for example, has analysed the potential size of outbreaks resulting from transmission within and between households 2 and the risk of Clade 1 mpox outbreaks among some key populations associated with key transmission routes 3. Another group of researchers have estimated the transmissibility of mpox in the DRC from more recent (2010 - 2019) surveillance data to update existing estimates, which are based on old data (Charniga, McCollum, et al. 2024). However, tackling ongoing outbreaks around the world will require a coordinated response from the global health community.\nThe Epiverse-TRACE team is developing a set of analytical tools that could help support decision-makers during outbreaks. This post provides an overview of the tasks that such tools can be applied to in the context of the ongoing mpox outbreaks." + }, + { + "objectID": "posts/mpox-preparedness/index.html#common-outbreak-analytics-tasks", + "href": "posts/mpox-preparedness/index.html#common-outbreak-analytics-tasks", + "title": "2024 mpox outbreak: common analytics tasks and available R tools", + "section": "Common outbreak analytics tasks", + "text": "Common outbreak analytics tasks\nOutbreak analytics in the context of the ongoing mpox outbreak involves several tasks that can be handled by existing and emerging R tools. Some of the tasks include estimating the transmission potential, forecasting infection dynamics, estimating severity, and assessing the impact of interventions.\nHere, we briefly describe some common tasks, data required, and the ready R tools/packages developed by the Epiverse-TRACE team and the wider community.\n\nCleaning and validating data\n\n\n\n\n\n\nData cleaning is often the first task in outbreak analytics. This usually involves identifying and correcting errors in the data, standardizing the format of key variables, and ensuring that the data is in a format that is fit for analysis. Data validation is also important to ensure that the data is accurate.\n\n\n\n{cleanepi} is useful for cleaning individual-level datasets, and {linelist} can be used to tag and validate key variables in datasets that might change over time. The {numberize} package can also be used to convert numbers written as text. It currently has functionality for English, Spanish, and French.\n\n\nEstimating transmission potential\n\n\n\n\n\n\nA key initial question during emerging outbreaks is the transmission potential of the disease. This is typically quantified using parameters such as: the basic reproduction number, \\(R_0\\); the time-varying reproduction number, \\(R_t\\); and \\(k\\), which captures individual heterogeneity in transmission (i.e. “superspreading” potential). These quantities are useful to assess the potential for further spread of the disease and the impact of interventions.\n\n\n\n\nPopulation-level transmissibility (\\(R_0\\) and \\(R_t\\))\nThe basic reproduction number, \\(R_0\\), is the average number of secondary cases produced by a single infected individual in a completely susceptible population. The time-varying reproduction number, \\(R_t\\), on the other hand, is the average number of secondary cases produced by a single infected individual at time \\(t\\) in a partially susceptible population. \\(R_t\\) is a more useful quantity during an outbreak as it accounts for the impact of interventions and changes in population immunity.\nIf data is available on the daily number of reported cases, {EpiNow2} and {EpiEstim} can be used to estimate \\(R_t\\). These packages require data on the time scale of transmission (i.e. the generation time, or the serial interval, which is commonly used as a proxy for this). While {EpiEstim} focuses on retrospective estimation of \\(R_t\\), {EpiNow2} is designed for both retrospective and real-time estimation.\nIn estimating \\(R_t\\), one practical consideration is the impact of various delays (biological and reporting) on the estimates (Charniga, Park, et al. 2024; Park et al. 2024; Katelyn M. Gostic 2020). {EpiNow2} adjusts for these delays in various ways. For example, it accounts for the symptom onset and reporting delays by taking the incubation period and reporting delay as inputs. Moreover, {EpiNow2} can estimate the reporting delay from the data if data on incidence by date of onset and report are available.\nFurthermore, dedicated packages have emerged for estimating epidemiological delays from data using best practices. {epidist} offers the ability to estimate delay distributions, accounting for issues such as truncation (i.e., not all disease outcomes will yet be known in real-time).\nIf delay data are not available, published estimates of the incubation period and serial interval can be used. The {epiparameter} package collates a database of epidemiological distributions from the literature and provides functions for interacting with the database. You can view the database for currently available parameters (more entries are planned). Additionally, if only summary statistics are available (e.g. range and median), {epiparameter} can be used to extract the distribution parameters.\n\n\nIndividual-level transmissibility (superspreading)\n\n\n\n\n\n\nThe individual-level transmission heterogeneity (superspreading), often denoted as \\(k\\), is an important measure for tailoring interventions at the individual level.\n\n\n\nIf we have data on the distribution of sizes of transmission clusters, the {epichains} package provides functions to set up the likelihood function to estimate \\(R_0\\) and \\(k\\). The user inputs the negative binomial offspring, which assumes individuals exhibit heterogeneity in transmission. The parameters of the negative offspring distribution can then be estimated using existing maximum likelihood or bayesian frameworks.\nFurthermore, if we have individual-level transmission chain data, the {superspreading} package can be used to estimate \\(R_0\\) and \\(k\\) from the offspring distribution. This package also provides functions to estimate the probability that an outbreak will not go extinct in its early stages because of randomness in transmission (e.g. if the primary spillover case(s) does not infect others).\nIf we have data on sexual contacts and the secondary attack rate, then we can also use {superspreading} to calculate \\(R_0\\) accounting for network effects.\n\n\n\nForecasting and nowcasting infection dynamics\n\n\n\n\n\n\nForecasting and nowcasting of infections are crucial for planning and resource allocation during an outbreak. Forecasting is the prediction of future cases, deaths, or other outcomes, while nowcasting is the prediction of the current outbreak situation. These predictions can help public health authorities to anticipate the trajectory of the outbreak and to implement timely interventions.\n\n\n\n{EpiNow2} and {epinowcast} provide functions to forecast and nowcast the number of cases. The data required for {EpiNow2} has already been described in the previous section. The {epinowcast} package similarly requires data on the number of cases reported per date. {epinowcast} does not currently support forecasting but there are plans to add this functionality in future versions.\n\n\nEstimating disease severity\n\n\n\n\n\n\nThe case fatality risk (CFR) is often used to assess the severity of a disease. CFR here refers to the proportion of deaths among confirmed cases.\n\n\n\nWith incidence data on the number of cases reported and the number of deaths reported, the {cfr} package can be used to estimate the case fatality rate and its uncertainty. Importantly, it accounts for the delay between the onset of symptoms and death, which is crucial for accurate estimation of the case fatality rate.\nHere again, {EpiNow2} can be used to estimate the time-varying case fatality ratio using the same data as for the reproduction number. {EpiNow2} can estimate other severity metrics, such as the case hospitalisation ratio, given data on cases and hospitalisations, and the hospitalisation fatality ratio, if data on hospitalisations and associated deaths are available.\n\n\nAssessing the impact of interventions\n\n\n\n\n\n\nmpox can be mitigated with behaviour change, treatment, and vaccination. Here, a few tools are available to assess the impact of intervention scenarios.\n\n\n\n{epidemics} provides ready compartmental models to estimate the impact of vaccination and non-pharmaceutical interventions like behaviour change, which can conceptually be modelled as a reduction in the transmission rate through changes in the population contact structure.\nIf we want to explore population-level outbreak dynamics, {epidemics} allows for stratifying the population into arbitrary groups, specifying the contact structure between these groups, and rates of interventions. The data required to run these models include: population structure, contact structure, and timing and magnitude of interventions. Data on social contact matrices can be obtained from the {socialmixr} package." + }, + { + "objectID": "posts/mpox-preparedness/index.html#summary", + "href": "posts/mpox-preparedness/index.html#summary", + "title": "2024 mpox outbreak: common analytics tasks and available R tools", + "section": "Summary", + "text": "Summary\nIn this post, we have outlined common outbreak analytics tasks relevant to the mpox outbreak, the data required, and R packages/tools that are currently available to facilitate these tasks. The tools described here are being developed by the Epiverse-TRACE team and the wider community, with the aim of ensuring high standards of research software development, and validation from end users, including epidemiologists, clinicians, and policy makers. The tools are designed to be user-friendly and well integrated, enabling one analysis task to easily feed into another. We would therefore be keen to hear from other groups interested in potentially collaborating or contributing on this growing ecosystem of tools.\nThanks to Karim Mane and Chris Hartgerink for their valuable comments on earlier drafts of this post." + }, + { + "objectID": "posts/mpox-preparedness/index.html#footnotes", + "href": "posts/mpox-preparedness/index.html#footnotes", + "title": "2024 mpox outbreak: common analytics tasks and available R tools", + "section": "Footnotes", + "text": "Footnotes\n\n\nWHO Disease Outbreak News↩︎\nModeling Household Transmission of Clade I Mpox in the United States↩︎\nRisk of Clade 1 Mpox Outbreaks Among Gay, Bisexual, and Other Men Who Have Sex With Men in the United States↩︎" + }, + { + "objectID": "posts/epiparameter_v0.1.0/index.html", + "href": "posts/epiparameter_v0.1.0/index.html", + "title": "epiparameter v0.1.0", + "section": "", + "text": "We are very excited to announce the release of a new epiparameter version v0.1.0. Here is an automatically generated summary of the changes in this version.\nInitial release of the {epiparameter} R package. {epiparameter} provides:" + }, + { + "objectID": "posts/epiparameter_v0.1.0/index.html#new-features", + "href": "posts/epiparameter_v0.1.0/index.html#new-features", + "title": "epiparameter v0.1.0", "section": "New features", - "text": "New features\n\nThe as_epidist() S3 generic has been added to the package for the coercion of other R objects into <epidist> objects. The as_epidist.data.frame() method is added, as well as the internal functions is_epireview() which determines if a <data.frame> is from {epireview}, and epireview_to_epidist() performs the conversion (#298, #334 & #335)\nThe epireview_core_cols.rda data is added to the package. This is used to determine whether the input to as_epidist.data.frame() is a parameter table from {epireview} as these objects do not have a recognisable class attribute (#298).\nA new website only vignette (i.e. article) data_from_epireview.Rmd is added that explains how to use as_epidist() with data from {epireview} (#298 & #335).\nA new vignette database.Rmd is added to the package to provide a web interface to the {epiparameter} library of epidemiological parameters. Contributed by @sbfnk (#311).\nThe plotting method for <epidist> objects (plot.epidist()) has been improved to better differentiate continuous from discrete or discretised distributions (#315).\nepidist_db(..., single_epidist = TRUE) now prioritises parameter entries that account for right truncation (#323).\ncreate_epidist_prob_dist() (previously named create_prob_dist()) is now exported and enables more control of discretisation settings by allowing arguments to be passed to distcrete::distcrete() via ... (#324).\nThe <multi_epidist> print method (print.multi_epidist()) has been improved to provides object information in the print header, the first few elements of the list or all elements if list is short, and some extra links and advice in the print footer. The design of the print method follows the design pattern of {pillar} (#326).\n<epidist> objects and functions that work with <epidist> objects now work with exponential distributions (#333).\nThe package now has an explicit data license: CC0 in the LICENSE file." + "text": "New features\n\nA library of 122 epidemiological parameter set from the epidemiological literature. This is accessible from the package as system data (sysdata.rda, as epiparameter::multi_epidist) or as internal data (inst/extdata/parameters.json).\nThe epidist_db() function loads epidemiological parameters from the library.\nDistribution parameter conversion and extraction functions (convert_params_to_summary_stats() & convert_summary_stats_to_params(), and extract_param()).\nAn S3 class to work with epidemiological parameters <epidist>. This class has S3 methods to aid users easily work with these data structures. These include printing, plotting, and distribution functions for PDF/PMF, CDF, random number generation and distribution quantiles. The <epidist> class has a constructor function, a validator function, accessors (get_*()), checkers (is_*()). There is also a <vb_epidist> S3 class for vector-borne parameters, and an internal <multi_epidist> class for improved printing of lists of <epidist> objects.\nThe package contains a few utility functions. list_distributions() is a helper function to provide information from a list of <epidist> objects in tabular form. calc_disc_dist_quantile() calculates the quantiles of a probability distribution based on the vector of probabilities and time data.\nFive vignettes are included in this initial release. One as an introduction to the package (epiparameter.Rmd), one as an tutorial on converting and extracting parameters (extract_convert.Rmd), one on the protocol used to collect entries for the library of epidemiological parameters (data_protocol.Rmd), a design vignette (design_principles.Rmd), and a supplementary vignette which quantifies the bias from using the parameter extraction (extract_param()) from {epiparameter} (extract-bias.Rmd).\nUnit tests (98% coverage) and documentation files.\nContinuous integration workflows for R package checks, rendering the README.md, calculating test coverage, deploying the pkgdown website, updates the package citation, linting package code, checking package or system dependency changes, updating copyright year, and validating the parameter library JSON file." }, { - "objectID": "posts/epiparameter_v0.2.0/index.html#breaking-changes", - "href": "posts/epiparameter_v0.2.0/index.html#breaking-changes", - "title": "epiparameter v0.2.0", + "objectID": "posts/epiparameter_v0.1.0/index.html#breaking-changes", + "href": "posts/epiparameter_v0.1.0/index.html#breaking-changes", + "title": "epiparameter v0.1.0", "section": "Breaking changes", - "text": "Breaking changes\n\nlist_distributions() has been replaced by parameter_tbl() which enhances the printing by leveraging {pillar} (#321).\nThe <vb_epidist> plotting method (plot.vb_epidist()) has been removed from the package. This provided minimal functionality and was unnecessarily complicating the function signature of plot.epidist() (#315)." + "text": "Breaking changes\n\nNone" }, { - "objectID": "posts/epiparameter_v0.2.0/index.html#bug-fixes", - "href": "posts/epiparameter_v0.2.0/index.html#bug-fixes", - "title": "epiparameter v0.2.0", + "objectID": "posts/epiparameter_v0.1.0/index.html#bug-fixes", + "href": "posts/epiparameter_v0.1.0/index.html#bug-fixes", + "title": "epiparameter v0.1.0", "section": "Bug fixes", - "text": "Bug fixes\n\nDOI and PMID are lowercase throughout the package to resolve issues with older versions of R (see issue #301) (#317)." + "text": "Bug fixes\n\nNone" }, { - "objectID": "posts/epiparameter_v0.2.0/index.html#deprecated-and-defunct", - "href": "posts/epiparameter_v0.2.0/index.html#deprecated-and-defunct", - "title": "epiparameter v0.2.0", + "objectID": "posts/epiparameter_v0.1.0/index.html#deprecated-and-defunct", + "href": "posts/epiparameter_v0.1.0/index.html#deprecated-and-defunct", + "title": "epiparameter v0.1.0", "section": "Deprecated and defunct", "text": "Deprecated and defunct\n\nNone" }, { - "objectID": "posts/epiparameter_v0.2.0/index.html#acknowledgements", - "href": "posts/epiparameter_v0.2.0/index.html#acknowledgements", - "title": "epiparameter v0.2.0", + "objectID": "posts/epiparameter_v0.1.0/index.html#acknowledgements", + "href": "posts/epiparameter_v0.1.0/index.html#acknowledgements", + "title": "epiparameter v0.1.0", "section": "Acknowledgements", - "text": "Acknowledgements\n@Bisaloo @CarmenTamayo @DxChen0126 @adamkucharski @chartgerink @chloerice48 @cm401 @jamesmbaazam @jlessler @joshwlambert @papsti @prabasaj @sbfnk @wzmli" + "text": "Acknowledgements\nMany thanks to the contributors to this release, either from issues, code contributions, reviews or discussions (listed in alphabetical order):\n@adamkucharski, @avallecam, @Bisaloo, @CarmenTamayo, @chartgerink, @jamesmbaazam, @maelle, @pratikunterwegs, @pitmonticone, @sbfnk, and @TimTaylor." }, { - "objectID": "posts/epidemics_v0.1.0/index.html", - "href": "posts/epidemics_v0.1.0/index.html", - "title": "epidemics v0.1.0", + "objectID": "posts/progressive-enhancement/index.html", + "href": "posts/progressive-enhancement/index.html", + "title": "Improving Ecosystem Interoperability Iteratively via Progressive Enhancement", "section": "", - "text": "We would like to announce the realease on GitHub of version 0.1.0 of epidemics.\nThis release is aimed at supporting the reproducibility of analyses in this report, which would be subject to breaking changes due to planned package development.\nPlease note that this version has not been released to CRAN. epidemics is still being actively developed, with major changes planned for the near future, which will soon supersede this version.\nepidemics v0.1.0 can be installed using the {pak} package from GitHub using pak::pak(\"epiverse-trace/epidemics@v0.1.0\").\nHere is an automatically generated summary of the changes in this version.\nThis is an initial GitHub release of epidemics, an R package that ships a library of compartmental epidemic model structures that can be used, along with supplied classes that help define population characteristics and epidemic response interventions including vaccinations, to compose and model epidemic scenarios.\nThe sections below describe the contents of this release." + "text": "We are continuing our post series on S3 object orientation and interoperability in R. We have previously discussed what makes a good S3 class and how to choose a good parent for it, as well as when to write or not write a custom method. We have highlighted in particular how classes inheriting from data.frames can simplify user experience because of familiarity, and reduce developer workload due to the pre-existing S3 methods.\nWe have detailed how to improve compatibility with the tidyverse by explaining:\nHere, we are going to explore how to start adding support in the ecosystem for the new S3 classes while minimizing user-facing breaking changes. We have previously delved into this topic with our post “Convert Your R Function to an S3 Generic: Benefits, Pitfalls & Design Considerations” and this is a wider and higher-level view of the same topic.\nThe strategy presented here is the variation of a common concept in web development and the web ecosystem: progressive enhancement. This philosophy aims to support browsers with a common set of essential features, and even richer features for browser with the most recent updates. It makes sense to think about this philosophy with the prism of introducing new classes to a new software ecosystem as it has the similar constraints of multiple stakeholders with different interests and timelines. The application of progressive enhancement in this context means that users or packages that have not (yet) adopted the new classes are not penalized compared to users or packages that have." }, { - "objectID": "posts/epidemics_v0.1.0/index.html#model-structures", - "href": "posts/epidemics_v0.1.0/index.html#model-structures", - "title": "epidemics v0.1.0", - "section": "Model structures", - "text": "Model structures\nThis release of epidemics includes four model structures supporting a range of composable elements to modify epidemic trajectories.\n\n“Default” model: A deterministic SEIR-V model allowing heterogeneity in social contacts between demographic groups, with optional, single-dose non-leaky vaccination;\n“Vacamole” model: A deterministic SEI-HRD-V2 implementation of a model allowing heterogeneity in social contacts between demographic groups, with two-dose leaky vaccination (V2), supporting different infection trajectories through the infectious and hospitalised (H) compartments for doubly vaccinated individuals, which tracks deaths (D), and which was initially developed by the Dutch public health agency RIVM for vaccine impact modelling during the Covid-19 pandemic, and published as Ainslie et al. 2022 https://doi.org/10.2807/1560-7917.ES.2022.27.44.2101090;\n“Diphtheria” model: A deterministic SEIHR model tracking outcomes for different demographic groups, but not including heterogeneity in social contacts, adapted from Finger et al. 2019 https://doi.org/10.1186/s12916-019-1288-7 and intended for application to disease outbreaks in a humanitarian camp setting;\n“Ebola” model: A discrete time stochastic SEIHFR model suitable for modelling Ebola virus disease and other haemorrhagic fevers, and which allows varying the efficacy of isolation in a hospital setting (H), and allows modelling transmission in a funeral context (F), as adapted from a consensus Ebola virus disease model in Li et al. 2019 https://doi.org/10.1098/rspb.2019.0774 and using simulation methods from Getz and Dougherty 2018 https://doi.org/10.1080/17513758.2017.1401677." + "objectID": "posts/progressive-enhancement/index.html#adding-class-support-to-function-inputs-via-progressive-enhancement", + "href": "posts/progressive-enhancement/index.html#adding-class-support-to-function-inputs-via-progressive-enhancement", + "title": "Improving Ecosystem Interoperability Iteratively via Progressive Enhancement", + "section": "Adding class support to function inputs via progressive enhancement", + "text": "Adding class support to function inputs via progressive enhancement\nThe goal here is to allow functions to accept the new classes as inputs, while keeping the old behaviour unchanged for unclassed objects (or with a different class than the new one).\nThis can conveniently be done in an almost transparent way by converting the old function to the S3 generic, and using the default method to handle the old behaviour. The practical steps, and minor caveats, have been previously described in the post “Convert Your R Function to an S3 Generic: Benefits, Pitfalls & Design Considerations”.\n\n\n\nA before / after type image showing the conversion of a function to a generic with a default method keeping the exisiting behaviour.\n\n\nFor a different, additional, example, we can consider a function working on patient-level data, which previously only accepted a data.frame as input:\n#' Compute length of stay in hospital on a patient-level dataset\n#'\n#' @param data A data.frame containing patient-level data\n#' @param admission_column The name of the column containing the admission date\n#' @param discharge_column The name of the column containing the discharge date\n#'\n#' @returns A numeric vector of hospitalization durations in days\ncompute_hospitalization_duration <- function(data, admission_column, discharge_column) {\n\n difftime(\n data[[discharge_column]],\n data[[admission_column]],\n units = \"days\"\n )\n\n}\nWe want to add support for linelist objects, as defined in the linelist package. linelist objects inherit from data.frame and contain an additional tags attribute. In particular, linelist objects can have a date_admission and date_discharge tag. This means we can use the tags to automatically detect the columns to use.\nBut we want the function to keep working for standard data.frames, tibbles, etc. We can follow the steps described in the previous post to convert the function to a generic, and add a default method to handle the old behaviour:\ncompute_hospitalization_duration <- function(data, ...) {\n\n UseMethod(\"compute_hospitalization_duration\")\n\n}\n\ncompute_hospitalization_duration.default <- function(data, admission_column, discharge_column) {\n\n difftime(\n data[[discharge_column]],\n data[[admission_column]],\n units = \"days\"\n )\n\n}\n\ncompute_hospitalization_duration.linelist <- function(data, ...) {\n\n x <- linelist::tags_df(data)\n\n compute_hospitalization_duration(\n data = x,\n admission_column = \"date_admission\",\n discharge_column = \"date_discharge\"\n )\n\n}\nIf the function was already a generic, then a new method for the new class should be added, leaving everything else unchanged." }, { - "objectID": "posts/epidemics_v0.1.0/index.html#solving-ode-systems-using-boost-odeint", - "href": "posts/epidemics_v0.1.0/index.html#solving-ode-systems-using-boost-odeint", - "title": "epidemics v0.1.0", - "section": "Solving ODE systems using Boost odeint", - "text": "Solving ODE systems using Boost odeint\nepidemics uses Boost’s odeint https://www.boost.org/doc/libs/1_84_0/libs/numeric/odeint/doc/html/boost_numeric_odeint/getting_started/overview.html to treat the deterministic models’ ordinary differential equations (ODEs) as initial value problems and solve them.\nModel ODEs are defined as structs with operators in the package headers, and exposed to R as internal Rcpp functions. The ‘default’, ‘Vacamole’, and ‘diphtheria’ models are ODE models defined in this way. This is intended to help reduce overheads associated with passing ODE systems written in R back and forth from a solver (such as those provided by {deSolve}), and is an easier way to define feature-rich models than writing C code for solvers provided by {deSolve} that accept compiled code.\nepidemics headers include tools for handling the C++ representations of R objects used in the package (see below), and can be imported by other Rcpp packages.\nThe ‘default’ and ‘Vacamole’ models have equivalent R-only implementations as well which use the {deSolve} package; these are intended to be made unavailable in future releases." + "objectID": "posts/progressive-enhancement/index.html#adding-class-support-to-function-outputs-via-progressive-enhancement", + "href": "posts/progressive-enhancement/index.html#adding-class-support-to-function-outputs-via-progressive-enhancement", + "title": "Improving Ecosystem Interoperability Iteratively via Progressive Enhancement", + "section": "Adding class support to function outputs via progressive enhancement", + "text": "Adding class support to function outputs via progressive enhancement\nAdding class support to function outputs is often more challenging. A common option is to add a new argument to the function, which would be a boolean indicating whether the output should be of the new class or not. But this doesn’t fit in the view of progressive enhancement, as it would require users to change their code to benefit from the new classes, or to suffer from breaking changes.\nWhile the new argument approach is sometimes indeed the only possible method, there are some situations where we can have an approach truly following the progressive enhancement philosophy.\nIn particular, this is the case when the old output was already inheriting from the parent of the new class (hence the importance of carefully choosing the parent class). In this situation, the new attributes from the new class should not interfere with existing code for downstream analysis.\nIn this case, let’s consider a function that was previously returning an unclassed data.frame with patient-level data:\ncreate_patient_dataset <- function(n_patients = 10) {\n\n data <- data.frame(\n patient_id = seq_len(n_patients),\n age = sample(18:99, n_patients, replace = TRUE)\n )\n\n return(data)\n\n}\nWe want to start returning a linelist object. Because linelist objects are data.frames (or tibbles) with an extra attr, it can be done in a transparent way:\ncreate_patient_dataset <- function(n_patients = 10) {\n\n data <- data.frame(\n patient_id = seq_len(n_patients),\n age = sample(18:99, n_patients, replace = TRUE)\n )\n\n data <- linelist::make_linelist(\n data,\n id = \"patient_id\",\n age = \"age\"\n )\n\n return(data)\n\n}\n\ninherits(data, \"data.frame\")\nFor a more realistic example, you can also see the work in progress to integrate the new contactmatrix standard format for social contact data to the contactdata package.\nThis is however only true if code in downstream analysis follows good practices in checking for the class of an object 1. If existing code was testing equality of the class to a certain value, it will break when the new class value is appended. This is described in a post on the R developer blog, when base R was adding a new array class value to matrix objects. Class inheritance should never be tested via class(x) == \"some_class\". Instead, inherits(x, \"some_class\") or is(x, \"some_class\") should be used to future-proof the code and allow appending an additional in the future." }, { - "objectID": "posts/epidemics_v0.1.0/index.html#composable-elements-as-classes", - "href": "posts/epidemics_v0.1.0/index.html#composable-elements-as-classes", - "title": "epidemics v0.1.0", - "section": "Composable elements as classes", - "text": "Composable elements as classes\nepidemics provides classes that help to organise the components of an epidemic scenario model.\n\n<population>: An S3 class to store population characteristics including the size of demographic groups, a social contacts matrix, and initial conditions for a model;\n<intervention>: An S3 abstract class and super-class that allows the definition of events that modify the epidemic trajectory:\n\n<rate_intervention>: A sub-class of <intervention> that allows the reduction of transition rates between model compartments to simulate the effect of policy interventions over a specific period;\n<contacts_intervention>: A sub-class of <intervention> that allows the reduction of social contacts to simulate the effect of policy interventions over a specific period;\n\n<vaccination>: An S3 class that holds the intervals and group-specific rates at which individuals transition into the ‘vaccinated’ compartment(s) of a model, if available;" + "objectID": "posts/progressive-enhancement/index.html#conclusion", + "href": "posts/progressive-enhancement/index.html#conclusion", + "title": "Improving Ecosystem Interoperability Iteratively via Progressive Enhancement", + "section": "Conclusion", + "text": "Conclusion\nObject oriented programming and S3 classes offer a convenient way to iteratively add interoperability in the ecosystem in a way that is minimally disruptive to users and developers. Newly classed input support can be added via custom methods (after converting the existing function to a generic if necessary). Newly classed output support can be added via progressive enhancement, by ensuring that the new class is a subclass of the old one and that downstream code uses good practices to test class inheritance.\nThanks to James Azam and Tim Taylor for their very valuable feedback on this post." }, { - "objectID": "posts/epidemics_v0.1.0/index.html#other-composable-elements", - "href": "posts/epidemics_v0.1.0/index.html#other-composable-elements", - "title": "epidemics v0.1.0", - "section": "Other composable elements", - "text": "Other composable elements\nepidemics allows models to include elements that affect an epidemic trajectory, but which are not custom classes.\n\nTime-dependence: All models can be passed a list of functions with two arguments, time and x which are expected to return x as a function of time, and which may be used to model the effect of seasonality in model parameters;\nPopulation changes: Applicable only to the diphtheria model, a two element list of time and values, which allow the definition of changes to the number of susceptible individuals in the model, and which may be used to model influxes and evacuations of individuals from humanitarian camps." + "objectID": "posts/progressive-enhancement/index.html#footnotes", + "href": "posts/progressive-enhancement/index.html#footnotes", + "title": "Improving Ecosystem Interoperability Iteratively via Progressive Enhancement", + "section": "Footnotes", + "text": "Footnotes\n\n\nThis is now enforced in R packages by R CMD check, and via the class_equals_linter() in the lintr package.↩︎" }, { - "objectID": "posts/epidemics_v0.1.0/index.html#output-processing-functions", - "href": "posts/epidemics_v0.1.0/index.html#output-processing-functions", - "title": "epidemics v0.1.0", - "section": "Output processing functions", - "text": "Output processing functions\nepidemics provides functions to help process the output of an epidemic model run, to calculate the size of the epidemic in each demographic group at any stage (epidemic_size()), and to calculate the number of new infections in each demographic group at each timepoint in the model (new_infections())." + "objectID": "posts/readepi_v0.1.0/index.html", + "href": "posts/readepi_v0.1.0/index.html", + "title": "readepi v0.1.0", + "section": "", + "text": "We are very excited to announce the release of a new readepi version v0.1.0. Here is an automatically generated summary of the changes in this version.\nThis release contains the initial implementation of the package where data import from health information systems (HIS) depends on existing R packages designed specifically to retrieve data from a given HIS.\n\n\n\nReuseCC BY 4.0CitationBibTeX citation:@online{epiverse-trace_development_team2024,\n author = {Epiverse-TRACE development team, The},\n title = {Readepi V0.1.0},\n date = {2024-06-10},\n url = {https://epiverse-trace.github.io/posts/readepi_v0.1.0/},\n langid = {en}\n}\nFor attribution, please cite this work as:\nEpiverse-TRACE development team, The. 2024. “Readepi\nV0.1.0.” June 10, 2024. https://epiverse-trace.github.io/posts/readepi_v0.1.0/." }, { - "objectID": "posts/epidemics_v0.1.0/index.html#usage-vignettes", - "href": "posts/epidemics_v0.1.0/index.html#usage-vignettes", - "title": "epidemics v0.1.0", - "section": "Usage vignettes", - "text": "Usage vignettes\nepidemics includes a range of usage vignettes that demonstrate how to:\n\nGet started with the package;\nGet started with modelling interventions on social contacts to control outbreaks;\nModel overlapping and sequential interventions on social contacts;\nModel interventions that modify transition rates between model compartments;\nGet started with modelling a vaccination campaign;\nModel time-dependence and seasonality in disease transmission dynamics;\nGenerate and model uncertainty in model parameters;\nReduce the number of parameters required for final size estimation;\nUse the ‘Vacamole’ model for scenarios of leaky vaccination and vaccine impact on hospitalisation;\nUse the ‘Ebola’ model for scenarios of responses to an Ebola virus disease outbreak;\nUse the ‘diphtheria’ model for scenarios of outbreaks in a humanitarian camp setting." + "objectID": "resources.html", + "href": "resources.html", + "title": "Resources", + "section": "", + "text": "On this page, you can find resources shared on our discussion forum. Submit your own and upvote the ones you like on GtiHub!\n\n\n\n\n\n\n \n \n \n Order By\n Default\n \n Date - Oldest\n \n \n Date - Newest\n \n \n upvotes (Low to High)\n \n \n upvotes (High to Low)\n \n \n \n \n \n \n \n\n\n \n \n \n The True ROI of Doing It Right the First Time in Software Projects\n \n \n \n \n 👍 3\n \n \n \n \n Best Practices\n \n \n \n \n \n Jun 3, 2024\n \n \n \n \n \n \n \n Painless introduction to object-oriented programming using S3 classes\n \n \n \n \n 👍 2\n \n \n \n \n documentation\n \n Tutorial\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Tidyverse book on code review\n \n \n \n \n 👍 2\n \n \n \n \n GitHub\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n How to Run GitHub Actions Locally Using the act CLI Tool\n \n \n \n \n 👍 2\n \n \n \n \n GitHub\n \n Tutorial\n \n \n \n \n \n May 6, 2024\n \n \n \n \n \n \n \n Quick example of how to use `tryCatch()`\n \n \n \n \n 👍 2\n \n \n \n \n Advanced R\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Using data.table\n \n \n \n \n 👍 2\n \n \n \n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Deep R Programming\n \n \n \n \n 👍 2\n \n \n \n \n Advanced R\n \n \n \n \n \n Jun 18, 2024\n \n \n \n \n \n \n \n Rcpp for everyone\n \n \n \n \n 👍 2\n \n \n \n \n Tutorial\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Tutorial on the types of tests to write for an R function\n \n \n \n \n 👍 1\n \n \n \n \n Best Practices\n \n \n \n \n \n Jun 18, 2024\n \n \n \n \n \n \n \n shinystan - Interactive diagnostics and posterior analysis for Bayesian models\n \n \n \n \n 👍 1\n \n \n \n \n \n \n \n \n Oct 16, 2024\n \n \n \n \n \n \n \n Explore about 2728 R color palettes\n \n \n \n \n 👍 1\n \n \n \n \n \n \n \n \n Jun 18, 2024\n \n \n \n \n \n \n \n Difference between the `apply()` family of functions in R\n \n \n \n \n 👍 1\n \n \n \n \n Advanced R\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Why You Should (or Shouldn’t) Build an API Client?\n \n \n \n \n 👍 1\n \n \n \n \n \n \n \n \n Jun 26, 2024\n \n \n \n \n \n \n \n Why I don’t add style options to my ggplot2 functions\n \n \n \n \n 👍 1\n \n \n \n \n Best Practices\n \n \n \n \n \n Jul 29, 2024\n \n \n \n \n \n \n \n Request and handle API response from R\n \n \n \n \n 👍 1\n \n \n \n \n Tutorial\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Epimodelac2023 - Outbreak Analysis and Modeling in Public Health\n \n \n \n \n 👍 1\n \n \n \n \n Tutorial\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Rcpp gallery\n \n \n \n \n 👍 1\n \n \n \n \n documentation\n \n Tutorial\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Checks and automation early feedback via local commit hooks and the precommit package\n \n \n \n \n 👍 1\n \n \n \n \n Best Practices\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Stan Playground - A web-based platform for writing and compiling stan programs\n \n \n \n \n 👍 1\n \n \n \n \n \n \n \n \n Oct 16, 2024\n \n \n \n \n \n \n \n CVE-2024-27322 Should Never Have Been Assigned And R Data Files Are Still Super Risky Even In R 4.4.0\n \n \n \n \n 👍 1\n \n \n \n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Epirecipes cookbook\n \n \n \n \n 👍 1\n \n \n \n \n Tutorial\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n\n\nNo matching items" }, { - "objectID": "posts/epidemics_v0.1.0/index.html#miscellaneous", - "href": "posts/epidemics_v0.1.0/index.html#miscellaneous", - "title": "epidemics v0.1.0", - "section": "Miscellaneous", - "text": "Miscellaneous\n\nWorkflows to render the vignettes and README as a website;\nTest code coverage of 93%." + "objectID": "getting-started.html", + "href": "getting-started.html", + "title": "Getting started", + "section": "", + "text": "Our aim is to help change how analytics are used in the global infectious disease response, moving towards integrated, generalisable and scalable community-driven software." }, { - "objectID": "posts/epidemics_v0.4.0/index.html", - "href": "posts/epidemics_v0.4.0/index.html", - "title": "epidemics v0.4.0", + "objectID": "getting-started.html#roadmap", + "href": "getting-started.html#roadmap", + "title": "Getting started", + "section": "Roadmap", + "text": "Roadmap" + }, + { + "objectID": "getting-started.html#epiverse-trace-packages", + "href": "getting-started.html#epiverse-trace-packages", + "title": "Getting started", + "section": "Epiverse-TRACE packages", + "text": "Epiverse-TRACE packages\nThese are built on methods and analysis that have informed epidemic response to infections including cholera, COVID, dengue, diphtheria, Ebola, influenza, and Zika.\n\nEarly tasks\nMiddle tasks\nLate tasks\n\n\n\n\n\nTools for reading data\n\n\n\n\n\nOpen data for Colombia\n\n\n\n\n\nTools for cleaning data\n\n\n\n\n\nSimulate outbreak data\n\n\n\n\n\nTagging and validating individual data\n\n\n\n\n\nTools for epidemiological parameters\n\n\n\n\n\nAutomated report for SIVIGILA data\n\n\n\n\n\nPipelines for common tasks\n\n\n\n\n\nSeverity estimation\n\n\n\n\n\nEstimate force of infection from serology\n\n\n\n\n\nTools for vector-borne infections\n\n\n\n\n\nSimulate vector-borne outbreaks\n\n\n\n\n\nSimulation and analysis of transmission chains\n\n\n\n\n\nAnalysis of transmission variation\n\n\n\n\n\nCalculate epidemic final size\n\n\n\n\n\nSimulate epidemic dynamics\n\n\n\n\n\nAnalyze vaccine efficacy" + }, + { + "objectID": "posts/statistical-correctness/index.html", + "href": "posts/statistical-correctness/index.html", + "title": "Ensuring & Showcasing the Statistical Correctness of your R Package", "section": "", - "text": "We are very excited to announce the release of a new epidemics version v0.4.0. Here is an automatically generated summary of the changes in this version.\nMaintainer is changing to @rozeggo." + "text": "We’re evolving in an increasingly data-driven world. And since critical decisions are taken based on results produced by data scientists and data analysts, they need to be be able to trust the tools they use. It is now increasingly common to add continuous integration to software packages and libraries, to ensure the code is not crashing, and that future updates don’t change your code output (snapshot tests). But one type of test still remains uncommon: tests for statistical correctness. That is, tests that ensure the algorithm implemented in your package actually produce the correct results.\nIt is likely that most statistical package authors run some tests on their own during development but there doesn’t seem to be guidelines on how to test statistical correctness in a solid and standard way 1.\nIn this blog post, we explore various methods to ensure the statistical correctness of your software. We argue that these tests should be part of your continuous integration system, to ensure your tools remains valid throughout its life, and to let users verify how you validate your package. Finally, we show how these principles are implemented in the Epiverse TRACE tools.\nThe approaches presented here are non-exclusive and should ideally all be added to your tests. However, they are presented in order of stringency and priority to implement. We also take a example of a function computing the centroid of a list of points to demonstrate how you would integrate the recommendations from this post with the {testthat} R package, often used from unit testing:\n#' Compute the centroid of a set of points\n#'\n#' @param coords Coordinates of the points as a list of vectors. Each element of the \n#' list is a point.\n#'\n#' @returns A vector of coordinates of the same length of each element of \n#' `coords`\n#' \n#' @examples\n#' centroid(list(c(0, 1, 5, 3), c(8, 6, 4, 3), c(10, 2, 3, 7)))\n#' \ncentroid <- function(coords) {\n\n # ...\n # Skip all the necessary input checking for the purpose of this demo\n # ...\n\n coords_mat <- do.call(rbind, coords)\n \n return(colMeans(coords_mat))\n \n}" }, { - "objectID": "posts/epidemics_v0.4.0/index.html#model-functions", - "href": "posts/epidemics_v0.4.0/index.html#model-functions", - "title": "epidemics v0.4.0", - "section": "Model functions", - "text": "Model functions\n\nInternal model functions for the models which allow vaccination have been corrected to prevent vaccination introducing negative values of susceptibles; tests added to check for this (#235, initially reported by @avallecam)." + "objectID": "posts/statistical-correctness/index.html#compare-your-results-to-the-reference-implementation", + "href": "posts/statistical-correctness/index.html#compare-your-results-to-the-reference-implementation", + "title": "Ensuring & Showcasing the Statistical Correctness of your R Package", + "section": "Compare your results to the reference implementation", + "text": "Compare your results to the reference implementation\nThe most straightforward and most solid way to ensure your implementation is valid is to compare your results to the results of the reference implementation. The reference implementation can be a package in another language, an example with toy data in the scientific article introducing the method, etc.\nFor example, the {gemma2} R package, which re-implements the methods from the GEMMA tool written in C++, verifies that values produced by both tools match:\ntest_that(\"Results of gemma2 equal those of GEMMA v 0.97\", {\n expect_equal(Sigma_ee, diag(c(18.559, 12.3672)), tolerance = 0.0001)\n expect_equal(Sigma_uu, diag(c(82.2973, 41.9238)), tolerance = 0.0001)\n})\n\n\n\n\n\n\nExample with centroid()\n\n\n\n\nlibrary(testthat)\n\ntest_that(\"centroid() in 1D produces the same results as mean()\", {\n\n x <- list(1, 5, 3, 10, 5)\n\n expect_identical(centroid(x), mean(unlist(x)))\n \n})\n\nTest passed 🎊\n\n\n\n\nNote that even if a reference implementation doesn’t exist, it is still good practice to compare your implementation to competing ones. Discrepancies might reveal a bug in your implementation or theirs but in any case, finding it out is beneficial to the community.\nHowever, this approach cannot be used in all cases. Indeed, there may not be a reference implementation in your case. Or it might be difficult to replicate identical computations in the case of algorithm with stochasticity 2." }, { - "objectID": "posts/epidemics_v0.4.0/index.html#helper-functions", - "href": "posts/epidemics_v0.4.0/index.html#helper-functions", - "title": "epidemics v0.4.0", - "section": "Helper functions", - "text": "Helper functions\n\nAdded the epidemic_peak() function to calculate the timing and size of the largest peak in each compartment in an scenario model (#240) by @bahadzie.\nAdded the outcomes_averted() function to compare epidemic scenarios (e.g. with and without interventions or vaccination) (#225, #230)." + "objectID": "posts/statistical-correctness/index.html#compare-to-a-theoretical-upper-or-lower-bound", + "href": "posts/statistical-correctness/index.html#compare-to-a-theoretical-upper-or-lower-bound", + "title": "Ensuring & Showcasing the Statistical Correctness of your R Package", + "section": "Compare to a theoretical upper or lower bound", + "text": "Compare to a theoretical upper or lower bound\nAn alternative strategy is to compare your result to theoretical upper or lower bound. This offers a weaker guarantee that your implementation and your results are correct but it can still allow you to detect important mistakes.\n\n\n\n\n\n\nExample with centroid()\n\n\n\n\ntest_that(\"centroid() is inside the hypercube containing the data points\", {\n \n x <- list(c(0, 1, 5, 3), c(8, 6, 4, 3), c(10, 2, 3, 7))\n\n expect_true(all(centroid(x) <= Reduce(pmax, x)))\n expect_true(all(centroid(x) >= Reduce(pmin, x)))\n \n})\n\nTest passed 😀\n\n\n\n\nYou can see a real-life example of this kind of test in the {finalsize} R package. {finalsize} computes the final proportion of infected in a heterogeneous population according to an SIR model. Theory predicts that the number of infections is maximal in a well-mixed population:\n# Calculates the upper limit of final size given the r0\n# The upper limit is given by a well mixed population\nupper_limit <- function(r0) {\n f <- function(par) {\n abs(1 - exp(-r0 * par[1]) - par[1])\n }\n opt <- optim(\n par = 0.5, fn = f,\n lower = 0, upper = 1,\n method = \"Brent\"\n )\n opt\n}" }, { - "objectID": "posts/epidemics_v0.4.0/index.html#documentation", - "href": "posts/epidemics_v0.4.0/index.html#documentation", - "title": "epidemics v0.4.0", - "section": "Documentation", - "text": "Documentation\n\nAdds a developer-focused vignette on how to modify epidemics and model structures to address potential modelling requests or tasks (#210).\nSplits up the ‘Modelling uncertainty and scenarios’ vignette into separate vignettes on uncertainty and scenario comparisons (#225).\nRemoved unnecessary plots from the vignette on modelling vaccination (#235).\nFixed link to socialmixr package in the ‘Get started’ and ‘Modelling interventions’ vignettes.\nUpdated and added documentation for all new or modified functions.\nUpdated references JSON file." + "objectID": "posts/statistical-correctness/index.html#verify-that-output-is-changing-as-expected-when-a-single-parameter-varies", + "href": "posts/statistical-correctness/index.html#verify-that-output-is-changing-as-expected-when-a-single-parameter-varies", + "title": "Ensuring & Showcasing the Statistical Correctness of your R Package", + "section": "Verify that output is changing as expected when a single parameter varies", + "text": "Verify that output is changing as expected when a single parameter varies\nAn even looser way to test statistical correctness would be to control that output varies as expected when you update some parameters. This could be for example, checking that the values you return increase when you increase or decrease one of your input parameters.\n\n\n\n\n\n\nExample with centroid()\n\n\n\n\ntest_that(\"centroid() increases when coordinates from one point increase\", {\n \n x <- list(c(0, 1, 5, 3), c(8, 6, 4, 3), c(10, 2, 3, 7))\n \n y <- x\n y[[1]] <- y[[1]] + 1 \n\n expect_true(all(centroid(x) < centroid(y)))\n \n})\n\nTest passed 😀\n\n\n\n\nAn example of this test in an actual R package can again be found in the finalsize package:\nr0_low <- 1.3\nr0_high <- 3.3\n\nepi_outcome_low <- final_size(\n r0 = r0_low,\n <...>\n)\nepi_outcome_high <- final_size(\n r0 = r0_high,\n <...>\n)\n\ntest_that(\"Higher values of R0 result in a higher number of infectious in all groups\", {\n expect_true(\n all(epi_outcome_high$p_infected > epi_outcome_low$p_infected)\n )\n})" }, { - "objectID": "posts/epidemics_v0.4.0/index.html#package", - "href": "posts/epidemics_v0.4.0/index.html#package", - "title": "epidemics v0.4.0", - "section": "Package", - "text": "Package\n\nUpdated Codecov GitHub Actions workflow to restore code coverage reporting.\nUpdated package title and citation file.\nUpdated _pkgdown.yaml with new vignette and updated section titles.\nUpdated WORDLIST." + "objectID": "posts/statistical-correctness/index.html#conclusion-automated-validation-vs-peer-review", + "href": "posts/statistical-correctness/index.html#conclusion-automated-validation-vs-peer-review", + "title": "Ensuring & Showcasing the Statistical Correctness of your R Package", + "section": "Conclusion: automated validation vs peer-review", + "text": "Conclusion: automated validation vs peer-review\nIn this post, we’ve presented different methods to automatically verify the statistical correctness of your statistical software. We would like to highlight one more time that it’s important to run these tests are part of your regular integration system, instead of running them just once at the start of the development. This will prevent the addition of possible errors in the code and show users what specific checks you are doing. By doing so, you are transparently committing to the highest quality.\nMultiple voices in the community are pushing more towards peer-review as a proxy for quality and validity:\n\nWe would like to highlight that automated validation and peer review are not mutually exclusive and answer slightly different purposes.\nOn the one hand, automated validation fails to catch more obscure bugs and edge cases. For example, a bug that would be difficult to detect via automated approach is the use of bad Random Number Generators when running in parallel.\nBut on the other hand, peer-review is less scalable, and journals usually have some editorial policy that might not make your package a good fit. Additionally, peer-review usually happens at one point in time while automated validation can, and should, be part of the continuous integration system.\nIdeally, peer-review and automated validation should work hand-in-hand, with review informing the addition of new automated validation tests." + }, + { + "objectID": "posts/statistical-correctness/index.html#footnotes", + "href": "posts/statistical-correctness/index.html#footnotes", + "title": "Ensuring & Showcasing the Statistical Correctness of your R Package", + "section": "Footnotes", + "text": "Footnotes\n\n\nBut see the “testing statistical software” post from Alex Hayes where he presents his process to determine if he deems a statistical package trustworthy or not, and rOpenSci Statistical Software Peer Review book.↩︎\nSetting the random seed is not enough to compare implementations across programming languages because different languages use different kind of Random Number Generators.↩︎" }, { "objectID": "posts/for-vs-apply/index.html", @@ -834,633 +932,563 @@ }, { "objectID": "posts/for-vs-apply/index.html#conclusion", - "href": "posts/for-vs-apply/index.html#conclusion", - "title": "Lesser-known reasons to prefer apply() over for loops", - "section": "Conclusion", - "text": "Conclusion\nThis post hopefully convinced you why it’s better to use apply() functions rather than for loops where possible (i.e., for iteration). Contrary to common misconception, the real reason is not performance, but code robustness and readability.\nThanks to Jaime Pavlich-Mariscal, James Azam, Tim Taylor, and Pratik Gupte for their thoughtful comments and suggestions on earlier drafts of this post.\n\n\n\n\n\n\nBeyond R\n\n\n\nThis post focused on R, but the same principles generally apply to other functional languages. In Python for example, you would use list comprehensions or the map() function.\n\n\n\n\n\n\n\n\nFurther reading\n\n\n\nIf you liked the code patterns recommended in this post and want to use functional programming in more situations, including recursion, I recommend you check out the “Functionals” chapter of the Advanced R book by Hadley Wickham" - }, - { - "objectID": "posts/for-vs-apply/index.html#footnotes", - "href": "posts/for-vs-apply/index.html#footnotes", - "title": "Lesser-known reasons to prefer apply() over for loops", - "section": "Footnotes", - "text": "Footnotes\n\n\nThere are a handful of rare corner cases where apply() is not the best method for iteration. These are cases that make use of match.call() or sys.call(). More details are available in lapply() documentation and in this GitHub comment by Tim Taylor during the review of this post.↩︎" - }, - { - "objectID": "posts/statistical-correctness/index.html", - "href": "posts/statistical-correctness/index.html", - "title": "Ensuring & Showcasing the Statistical Correctness of your R Package", - "section": "", - "text": "We’re evolving in an increasingly data-driven world. And since critical decisions are taken based on results produced by data scientists and data analysts, they need to be be able to trust the tools they use. It is now increasingly common to add continuous integration to software packages and libraries, to ensure the code is not crashing, and that future updates don’t change your code output (snapshot tests). But one type of test still remains uncommon: tests for statistical correctness. That is, tests that ensure the algorithm implemented in your package actually produce the correct results.\nIt is likely that most statistical package authors run some tests on their own during development but there doesn’t seem to be guidelines on how to test statistical correctness in a solid and standard way 1.\nIn this blog post, we explore various methods to ensure the statistical correctness of your software. We argue that these tests should be part of your continuous integration system, to ensure your tools remains valid throughout its life, and to let users verify how you validate your package. Finally, we show how these principles are implemented in the Epiverse TRACE tools.\nThe approaches presented here are non-exclusive and should ideally all be added to your tests. However, they are presented in order of stringency and priority to implement. We also take a example of a function computing the centroid of a list of points to demonstrate how you would integrate the recommendations from this post with the {testthat} R package, often used from unit testing:\n#' Compute the centroid of a set of points\n#'\n#' @param coords Coordinates of the points as a list of vectors. Each element of the \n#' list is a point.\n#'\n#' @returns A vector of coordinates of the same length of each element of \n#' `coords`\n#' \n#' @examples\n#' centroid(list(c(0, 1, 5, 3), c(8, 6, 4, 3), c(10, 2, 3, 7)))\n#' \ncentroid <- function(coords) {\n\n # ...\n # Skip all the necessary input checking for the purpose of this demo\n # ...\n\n coords_mat <- do.call(rbind, coords)\n \n return(colMeans(coords_mat))\n \n}" - }, - { - "objectID": "posts/statistical-correctness/index.html#compare-your-results-to-the-reference-implementation", - "href": "posts/statistical-correctness/index.html#compare-your-results-to-the-reference-implementation", - "title": "Ensuring & Showcasing the Statistical Correctness of your R Package", - "section": "Compare your results to the reference implementation", - "text": "Compare your results to the reference implementation\nThe most straightforward and most solid way to ensure your implementation is valid is to compare your results to the results of the reference implementation. The reference implementation can be a package in another language, an example with toy data in the scientific article introducing the method, etc.\nFor example, the {gemma2} R package, which re-implements the methods from the GEMMA tool written in C++, verifies that values produced by both tools match:\ntest_that(\"Results of gemma2 equal those of GEMMA v 0.97\", {\n expect_equal(Sigma_ee, diag(c(18.559, 12.3672)), tolerance = 0.0001)\n expect_equal(Sigma_uu, diag(c(82.2973, 41.9238)), tolerance = 0.0001)\n})\n\n\n\n\n\n\nExample with centroid()\n\n\n\n\nlibrary(testthat)\n\ntest_that(\"centroid() in 1D produces the same results as mean()\", {\n\n x <- list(1, 5, 3, 10, 5)\n\n expect_identical(centroid(x), mean(unlist(x)))\n \n})\n\nTest passed 🎊\n\n\n\n\nNote that even if a reference implementation doesn’t exist, it is still good practice to compare your implementation to competing ones. Discrepancies might reveal a bug in your implementation or theirs but in any case, finding it out is beneficial to the community.\nHowever, this approach cannot be used in all cases. Indeed, there may not be a reference implementation in your case. Or it might be difficult to replicate identical computations in the case of algorithm with stochasticity 2." - }, - { - "objectID": "posts/statistical-correctness/index.html#compare-to-a-theoretical-upper-or-lower-bound", - "href": "posts/statistical-correctness/index.html#compare-to-a-theoretical-upper-or-lower-bound", - "title": "Ensuring & Showcasing the Statistical Correctness of your R Package", - "section": "Compare to a theoretical upper or lower bound", - "text": "Compare to a theoretical upper or lower bound\nAn alternative strategy is to compare your result to theoretical upper or lower bound. This offers a weaker guarantee that your implementation and your results are correct but it can still allow you to detect important mistakes.\n\n\n\n\n\n\nExample with centroid()\n\n\n\n\ntest_that(\"centroid() is inside the hypercube containing the data points\", {\n \n x <- list(c(0, 1, 5, 3), c(8, 6, 4, 3), c(10, 2, 3, 7))\n\n expect_true(all(centroid(x) <= Reduce(pmax, x)))\n expect_true(all(centroid(x) >= Reduce(pmin, x)))\n \n})\n\nTest passed 😀\n\n\n\n\nYou can see a real-life example of this kind of test in the {finalsize} R package. {finalsize} computes the final proportion of infected in a heterogeneous population according to an SIR model. Theory predicts that the number of infections is maximal in a well-mixed population:\n# Calculates the upper limit of final size given the r0\n# The upper limit is given by a well mixed population\nupper_limit <- function(r0) {\n f <- function(par) {\n abs(1 - exp(-r0 * par[1]) - par[1])\n }\n opt <- optim(\n par = 0.5, fn = f,\n lower = 0, upper = 1,\n method = \"Brent\"\n )\n opt\n}" - }, - { - "objectID": "posts/statistical-correctness/index.html#verify-that-output-is-changing-as-expected-when-a-single-parameter-varies", - "href": "posts/statistical-correctness/index.html#verify-that-output-is-changing-as-expected-when-a-single-parameter-varies", - "title": "Ensuring & Showcasing the Statistical Correctness of your R Package", - "section": "Verify that output is changing as expected when a single parameter varies", - "text": "Verify that output is changing as expected when a single parameter varies\nAn even looser way to test statistical correctness would be to control that output varies as expected when you update some parameters. This could be for example, checking that the values you return increase when you increase or decrease one of your input parameters.\n\n\n\n\n\n\nExample with centroid()\n\n\n\n\ntest_that(\"centroid() increases when coordinates from one point increase\", {\n \n x <- list(c(0, 1, 5, 3), c(8, 6, 4, 3), c(10, 2, 3, 7))\n \n y <- x\n y[[1]] <- y[[1]] + 1 \n\n expect_true(all(centroid(x) < centroid(y)))\n \n})\n\nTest passed 😀\n\n\n\n\nAn example of this test in an actual R package can again be found in the finalsize package:\nr0_low <- 1.3\nr0_high <- 3.3\n\nepi_outcome_low <- final_size(\n r0 = r0_low,\n <...>\n)\nepi_outcome_high <- final_size(\n r0 = r0_high,\n <...>\n)\n\ntest_that(\"Higher values of R0 result in a higher number of infectious in all groups\", {\n expect_true(\n all(epi_outcome_high$p_infected > epi_outcome_low$p_infected)\n )\n})" - }, - { - "objectID": "posts/statistical-correctness/index.html#conclusion-automated-validation-vs-peer-review", - "href": "posts/statistical-correctness/index.html#conclusion-automated-validation-vs-peer-review", - "title": "Ensuring & Showcasing the Statistical Correctness of your R Package", - "section": "Conclusion: automated validation vs peer-review", - "text": "Conclusion: automated validation vs peer-review\nIn this post, we’ve presented different methods to automatically verify the statistical correctness of your statistical software. We would like to highlight one more time that it’s important to run these tests are part of your regular integration system, instead of running them just once at the start of the development. This will prevent the addition of possible errors in the code and show users what specific checks you are doing. By doing so, you are transparently committing to the highest quality.\nMultiple voices in the community are pushing more towards peer-review as a proxy for quality and validity:\n\nWe would like to highlight that automated validation and peer review are not mutually exclusive and answer slightly different purposes.\nOn the one hand, automated validation fails to catch more obscure bugs and edge cases. For example, a bug that would be difficult to detect via automated approach is the use of bad Random Number Generators when running in parallel.\nBut on the other hand, peer-review is less scalable, and journals usually have some editorial policy that might not make your package a good fit. Additionally, peer-review usually happens at one point in time while automated validation can, and should, be part of the continuous integration system.\nIdeally, peer-review and automated validation should work hand-in-hand, with review informing the addition of new automated validation tests." + "href": "posts/for-vs-apply/index.html#conclusion", + "title": "Lesser-known reasons to prefer apply() over for loops", + "section": "Conclusion", + "text": "Conclusion\nThis post hopefully convinced you why it’s better to use apply() functions rather than for loops where possible (i.e., for iteration). Contrary to common misconception, the real reason is not performance, but code robustness and readability.\nThanks to Jaime Pavlich-Mariscal, James Azam, Tim Taylor, and Pratik Gupte for their thoughtful comments and suggestions on earlier drafts of this post.\n\n\n\n\n\n\nBeyond R\n\n\n\nThis post focused on R, but the same principles generally apply to other functional languages. In Python for example, you would use list comprehensions or the map() function.\n\n\n\n\n\n\n\n\nFurther reading\n\n\n\nIf you liked the code patterns recommended in this post and want to use functional programming in more situations, including recursion, I recommend you check out the “Functionals” chapter of the Advanced R book by Hadley Wickham" }, { - "objectID": "posts/statistical-correctness/index.html#footnotes", - "href": "posts/statistical-correctness/index.html#footnotes", - "title": "Ensuring & Showcasing the Statistical Correctness of your R Package", + "objectID": "posts/for-vs-apply/index.html#footnotes", + "href": "posts/for-vs-apply/index.html#footnotes", + "title": "Lesser-known reasons to prefer apply() over for loops", "section": "Footnotes", - "text": "Footnotes\n\n\nBut see the “testing statistical software” post from Alex Hayes where he presents his process to determine if he deems a statistical package trustworthy or not, and rOpenSci Statistical Software Peer Review book.↩︎\nSetting the random seed is not enough to compare implementations across programming languages because different languages use different kind of Random Number Generators.↩︎" + "text": "Footnotes\n\n\nThere are a handful of rare corner cases where apply() is not the best method for iteration. These are cases that make use of match.call() or sys.call(). More details are available in lapply() documentation and in this GitHub comment by Tim Taylor during the review of this post.↩︎" }, { - "objectID": "getting-started.html", - "href": "getting-started.html", - "title": "Getting started", + "objectID": "posts/epidemics_v0.4.0/index.html", + "href": "posts/epidemics_v0.4.0/index.html", + "title": "epidemics v0.4.0", "section": "", - "text": "Our aim is to help change how analytics are used in the global infectious disease response, moving towards integrated, generalisable and scalable community-driven software." + "text": "We are very excited to announce the release of a new epidemics version v0.4.0. Here is an automatically generated summary of the changes in this version.\nMaintainer is changing to @rozeggo." }, { - "objectID": "getting-started.html#roadmap", - "href": "getting-started.html#roadmap", - "title": "Getting started", - "section": "Roadmap", - "text": "Roadmap" + "objectID": "posts/epidemics_v0.4.0/index.html#model-functions", + "href": "posts/epidemics_v0.4.0/index.html#model-functions", + "title": "epidemics v0.4.0", + "section": "Model functions", + "text": "Model functions\n\nInternal model functions for the models which allow vaccination have been corrected to prevent vaccination introducing negative values of susceptibles; tests added to check for this (#235, initially reported by @avallecam)." }, { - "objectID": "getting-started.html#epiverse-trace-packages", - "href": "getting-started.html#epiverse-trace-packages", - "title": "Getting started", - "section": "Epiverse-TRACE packages", - "text": "Epiverse-TRACE packages\nThese are built on methods and analysis that have informed epidemic response to infections including cholera, COVID, dengue, diphtheria, Ebola, influenza, and Zika.\n\nEarly tasks\nMiddle tasks\nLate tasks\n\n\n\n\n\nTools for reading data\n\n\n\n\n\nOpen data for Colombia\n\n\n\n\n\nTools for cleaning data\n\n\n\n\n\nSimulate outbreak data\n\n\n\n\n\nTagging and validating individual data\n\n\n\n\n\nTools for epidemiological parameters\n\n\n\n\n\nAutomated report for SIVIGILA data\n\n\n\n\n\nPipelines for common tasks\n\n\n\n\n\nSeverity estimation\n\n\n\n\n\nEstimate force of infection from serology\n\n\n\n\n\nTools for vector-borne infections\n\n\n\n\n\nSimulate vector-borne outbreaks\n\n\n\n\n\nSimulation and analysis of transmission chains\n\n\n\n\n\nAnalysis of transmission variation\n\n\n\n\n\nCalculate epidemic final size\n\n\n\n\n\nSimulate epidemic dynamics\n\n\n\n\n\nAnalyze vaccine efficacy" + "objectID": "posts/epidemics_v0.4.0/index.html#helper-functions", + "href": "posts/epidemics_v0.4.0/index.html#helper-functions", + "title": "epidemics v0.4.0", + "section": "Helper functions", + "text": "Helper functions\n\nAdded the epidemic_peak() function to calculate the timing and size of the largest peak in each compartment in an scenario model (#240) by @bahadzie.\nAdded the outcomes_averted() function to compare epidemic scenarios (e.g. with and without interventions or vaccination) (#225, #230)." }, { - "objectID": "resources.html", - "href": "resources.html", - "title": "Resources", - "section": "", - "text": "On this page, you can find resources shared on our discussion forum. Submit your own and upvote the ones you like on GtiHub!\n\n\n\n\n\n\n \n \n \n Order By\n Default\n \n Date - Oldest\n \n \n Date - Newest\n \n \n upvotes (Low to High)\n \n \n upvotes (High to Low)\n \n \n \n \n \n \n \n\n\n \n \n \n The True ROI of Doing It Right the First Time in Software Projects\n \n \n \n \n 👍 3\n \n \n \n \n Best Practices\n \n \n \n \n \n Jun 3, 2024\n \n \n \n \n \n \n \n Painless introduction to object-oriented programming using S3 classes\n \n \n \n \n 👍 2\n \n \n \n \n documentation\n \n Tutorial\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Tidyverse book on code review\n \n \n \n \n 👍 2\n \n \n \n \n GitHub\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n How to Run GitHub Actions Locally Using the act CLI Tool\n \n \n \n \n 👍 2\n \n \n \n \n GitHub\n \n Tutorial\n \n \n \n \n \n May 6, 2024\n \n \n \n \n \n \n \n Quick example of how to use `tryCatch()`\n \n \n \n \n 👍 2\n \n \n \n \n Advanced R\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Using data.table\n \n \n \n \n 👍 2\n \n \n \n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Deep R Programming\n \n \n \n \n 👍 2\n \n \n \n \n Advanced R\n \n \n \n \n \n Jun 18, 2024\n \n \n \n \n \n \n \n Rcpp for everyone\n \n \n \n \n 👍 2\n \n \n \n \n Tutorial\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Tutorial on the types of tests to write for an R function\n \n \n \n \n 👍 1\n \n \n \n \n Best Practices\n \n \n \n \n \n Jun 18, 2024\n \n \n \n \n \n \n \n shinystan - Interactive diagnostics and posterior analysis for Bayesian models\n \n \n \n \n 👍 1\n \n \n \n \n \n \n \n \n Oct 16, 2024\n \n \n \n \n \n \n \n Explore about 2728 R color palettes\n \n \n \n \n 👍 1\n \n \n \n \n \n \n \n \n Jun 18, 2024\n \n \n \n \n \n \n \n Difference between the `apply()` family of functions in R\n \n \n \n \n 👍 1\n \n \n \n \n Advanced R\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Why You Should (or Shouldn’t) Build an API Client?\n \n \n \n \n 👍 1\n \n \n \n \n \n \n \n \n Jun 26, 2024\n \n \n \n \n \n \n \n Why I don’t add style options to my ggplot2 functions\n \n \n \n \n 👍 1\n \n \n \n \n Best Practices\n \n \n \n \n \n Jul 29, 2024\n \n \n \n \n \n \n \n Request and handle API response from R\n \n \n \n \n 👍 1\n \n \n \n \n Tutorial\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Epimodelac2023 - Outbreak Analysis and Modeling in Public Health\n \n \n \n \n 👍 1\n \n \n \n \n Tutorial\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Rcpp gallery\n \n \n \n \n 👍 1\n \n \n \n \n documentation\n \n Tutorial\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Checks and automation early feedback via local commit hooks and the precommit package\n \n \n \n \n 👍 1\n \n \n \n \n Best Practices\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Stan Playground - A web-based platform for writing and compiling stan programs\n \n \n \n \n 👍 1\n \n \n \n \n \n \n \n \n Oct 16, 2024\n \n \n \n \n \n \n \n CVE-2024-27322 Should Never Have Been Assigned And R Data Files Are Still Super Risky Even In R 4.4.0\n \n \n \n \n 👍 1\n \n \n \n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n \n \n Epirecipes cookbook\n \n \n \n \n 👍 1\n \n \n \n \n Tutorial\n \n \n \n \n \n May 7, 2024\n \n \n \n \n \n\n\nNo matching items" + "objectID": "posts/epidemics_v0.4.0/index.html#documentation", + "href": "posts/epidemics_v0.4.0/index.html#documentation", + "title": "epidemics v0.4.0", + "section": "Documentation", + "text": "Documentation\n\nAdds a developer-focused vignette on how to modify epidemics and model structures to address potential modelling requests or tasks (#210).\nSplits up the ‘Modelling uncertainty and scenarios’ vignette into separate vignettes on uncertainty and scenario comparisons (#225).\nRemoved unnecessary plots from the vignette on modelling vaccination (#235).\nFixed link to socialmixr package in the ‘Get started’ and ‘Modelling interventions’ vignettes.\nUpdated and added documentation for all new or modified functions.\nUpdated references JSON file." }, { - "objectID": "posts/readepi_v0.1.0/index.html", - "href": "posts/readepi_v0.1.0/index.html", - "title": "readepi v0.1.0", - "section": "", - "text": "We are very excited to announce the release of a new readepi version v0.1.0. Here is an automatically generated summary of the changes in this version.\nThis release contains the initial implementation of the package where data import from health information systems (HIS) depends on existing R packages designed specifically to retrieve data from a given HIS.\n\n\n\nReuseCC BY 4.0CitationBibTeX citation:@online{epiverse-trace_development_team2024,\n author = {Epiverse-TRACE development team, The},\n title = {Readepi V0.1.0},\n date = {2024-06-10},\n url = {https://epiverse-trace.github.io/posts/readepi_v0.1.0/},\n langid = {en}\n}\nFor attribution, please cite this work as:\nEpiverse-TRACE development team, The. 2024. “Readepi\nV0.1.0.” June 10, 2024. https://epiverse-trace.github.io/posts/readepi_v0.1.0/." + "objectID": "posts/epidemics_v0.4.0/index.html#package", + "href": "posts/epidemics_v0.4.0/index.html#package", + "title": "epidemics v0.4.0", + "section": "Package", + "text": "Package\n\nUpdated Codecov GitHub Actions workflow to restore code coverage reporting.\nUpdated package title and citation file.\nUpdated _pkgdown.yaml with new vignette and updated section titles.\nUpdated WORDLIST." }, { - "objectID": "posts/progressive-enhancement/index.html", - "href": "posts/progressive-enhancement/index.html", - "title": "Improving Ecosystem Interoperability Iteratively via Progressive Enhancement", + "objectID": "posts/epidemics_v0.1.0/index.html", + "href": "posts/epidemics_v0.1.0/index.html", + "title": "epidemics v0.1.0", "section": "", - "text": "We are continuing our post series on S3 object orientation and interoperability in R. We have previously discussed what makes a good S3 class and how to choose a good parent for it, as well as when to write or not write a custom method. We have highlighted in particular how classes inheriting from data.frames can simplify user experience because of familiarity, and reduce developer workload due to the pre-existing S3 methods.\nWe have detailed how to improve compatibility with the tidyverse by explaining:\nHere, we are going to explore how to start adding support in the ecosystem for the new S3 classes while minimizing user-facing breaking changes. We have previously delved into this topic with our post “Convert Your R Function to an S3 Generic: Benefits, Pitfalls & Design Considerations” and this is a wider and higher-level view of the same topic.\nThe strategy presented here is the variation of a common concept in web development and the web ecosystem: progressive enhancement. This philosophy aims to support browsers with a common set of essential features, and even richer features for browser with the most recent updates. It makes sense to think about this philosophy with the prism of introducing new classes to a new software ecosystem as it has the similar constraints of multiple stakeholders with different interests and timelines. The application of progressive enhancement in this context means that users or packages that have not (yet) adopted the new classes are not penalized compared to users or packages that have." + "text": "We would like to announce the realease on GitHub of version 0.1.0 of epidemics.\nThis release is aimed at supporting the reproducibility of analyses in this report, which would be subject to breaking changes due to planned package development.\nPlease note that this version has not been released to CRAN. epidemics is still being actively developed, with major changes planned for the near future, which will soon supersede this version.\nepidemics v0.1.0 can be installed using the {pak} package from GitHub using pak::pak(\"epiverse-trace/epidemics@v0.1.0\").\nHere is an automatically generated summary of the changes in this version.\nThis is an initial GitHub release of epidemics, an R package that ships a library of compartmental epidemic model structures that can be used, along with supplied classes that help define population characteristics and epidemic response interventions including vaccinations, to compose and model epidemic scenarios.\nThe sections below describe the contents of this release." }, { - "objectID": "posts/progressive-enhancement/index.html#adding-class-support-to-function-inputs-via-progressive-enhancement", - "href": "posts/progressive-enhancement/index.html#adding-class-support-to-function-inputs-via-progressive-enhancement", - "title": "Improving Ecosystem Interoperability Iteratively via Progressive Enhancement", - "section": "Adding class support to function inputs via progressive enhancement", - "text": "Adding class support to function inputs via progressive enhancement\nThe goal here is to allow functions to accept the new classes as inputs, while keeping the old behaviour unchanged for unclassed objects (or with a different class than the new one).\nThis can conveniently be done in an almost transparent way by converting the old function to the S3 generic, and using the default method to handle the old behaviour. The practical steps, and minor caveats, have been previously described in the post “Convert Your R Function to an S3 Generic: Benefits, Pitfalls & Design Considerations”.\n\n\n\nA before / after type image showing the conversion of a function to a generic with a default method keeping the exisiting behaviour.\n\n\nFor a different, additional, example, we can consider a function working on patient-level data, which previously only accepted a data.frame as input:\n#' Compute length of stay in hospital on a patient-level dataset\n#'\n#' @param data A data.frame containing patient-level data\n#' @param admission_column The name of the column containing the admission date\n#' @param discharge_column The name of the column containing the discharge date\n#'\n#' @returns A numeric vector of hospitalization durations in days\ncompute_hospitalization_duration <- function(data, admission_column, discharge_column) {\n\n difftime(\n data[[discharge_column]],\n data[[admission_column]],\n units = \"days\"\n )\n\n}\nWe want to add support for linelist objects, as defined in the linelist package. linelist objects inherit from data.frame and contain an additional tags attribute. In particular, linelist objects can have a date_admission and date_discharge tag. This means we can use the tags to automatically detect the columns to use.\nBut we want the function to keep working for standard data.frames, tibbles, etc. We can follow the steps described in the previous post to convert the function to a generic, and add a default method to handle the old behaviour:\ncompute_hospitalization_duration <- function(data, ...) {\n\n UseMethod(\"compute_hospitalization_duration\")\n\n}\n\ncompute_hospitalization_duration.default <- function(data, admission_column, discharge_column) {\n\n difftime(\n data[[discharge_column]],\n data[[admission_column]],\n units = \"days\"\n )\n\n}\n\ncompute_hospitalization_duration.linelist <- function(data, ...) {\n\n x <- linelist::tags_df(data)\n\n compute_hospitalization_duration(\n data = x,\n admission_column = \"date_admission\",\n discharge_column = \"date_discharge\"\n )\n\n}\nIf the function was already a generic, then a new method for the new class should be added, leaving everything else unchanged." + "objectID": "posts/epidemics_v0.1.0/index.html#model-structures", + "href": "posts/epidemics_v0.1.0/index.html#model-structures", + "title": "epidemics v0.1.0", + "section": "Model structures", + "text": "Model structures\nThis release of epidemics includes four model structures supporting a range of composable elements to modify epidemic trajectories.\n\n“Default” model: A deterministic SEIR-V model allowing heterogeneity in social contacts between demographic groups, with optional, single-dose non-leaky vaccination;\n“Vacamole” model: A deterministic SEI-HRD-V2 implementation of a model allowing heterogeneity in social contacts between demographic groups, with two-dose leaky vaccination (V2), supporting different infection trajectories through the infectious and hospitalised (H) compartments for doubly vaccinated individuals, which tracks deaths (D), and which was initially developed by the Dutch public health agency RIVM for vaccine impact modelling during the Covid-19 pandemic, and published as Ainslie et al. 2022 https://doi.org/10.2807/1560-7917.ES.2022.27.44.2101090;\n“Diphtheria” model: A deterministic SEIHR model tracking outcomes for different demographic groups, but not including heterogeneity in social contacts, adapted from Finger et al. 2019 https://doi.org/10.1186/s12916-019-1288-7 and intended for application to disease outbreaks in a humanitarian camp setting;\n“Ebola” model: A discrete time stochastic SEIHFR model suitable for modelling Ebola virus disease and other haemorrhagic fevers, and which allows varying the efficacy of isolation in a hospital setting (H), and allows modelling transmission in a funeral context (F), as adapted from a consensus Ebola virus disease model in Li et al. 2019 https://doi.org/10.1098/rspb.2019.0774 and using simulation methods from Getz and Dougherty 2018 https://doi.org/10.1080/17513758.2017.1401677." }, { - "objectID": "posts/progressive-enhancement/index.html#adding-class-support-to-function-outputs-via-progressive-enhancement", - "href": "posts/progressive-enhancement/index.html#adding-class-support-to-function-outputs-via-progressive-enhancement", - "title": "Improving Ecosystem Interoperability Iteratively via Progressive Enhancement", - "section": "Adding class support to function outputs via progressive enhancement", - "text": "Adding class support to function outputs via progressive enhancement\nAdding class support to function outputs is often more challenging. A common option is to add a new argument to the function, which would be a boolean indicating whether the output should be of the new class or not. But this doesn’t fit in the view of progressive enhancement, as it would require users to change their code to benefit from the new classes, or to suffer from breaking changes.\nWhile the new argument approach is sometimes indeed the only possible method, there are some situations where we can have an approach truly following the progressive enhancement philosophy.\nIn particular, this is the case when the old output was already inheriting from the parent of the new class (hence the importance of carefully choosing the parent class). In this situation, the new attributes from the new class should not interfere with existing code for downstream analysis.\nIn this case, let’s consider a function that was previously returning an unclassed data.frame with patient-level data:\ncreate_patient_dataset <- function(n_patients = 10) {\n\n data <- data.frame(\n patient_id = seq_len(n_patients),\n age = sample(18:99, n_patients, replace = TRUE)\n )\n\n return(data)\n\n}\nWe want to start returning a linelist object. Because linelist objects are data.frames (or tibbles) with an extra attr, it can be done in a transparent way:\ncreate_patient_dataset <- function(n_patients = 10) {\n\n data <- data.frame(\n patient_id = seq_len(n_patients),\n age = sample(18:99, n_patients, replace = TRUE)\n )\n\n data <- linelist::make_linelist(\n data,\n id = \"patient_id\",\n age = \"age\"\n )\n\n return(data)\n\n}\n\ninherits(data, \"data.frame\")\nFor a more realistic example, you can also see the work in progress to integrate the new contactmatrix standard format for social contact data to the contactdata package.\nThis is however only true if code in downstream analysis follows good practices in checking for the class of an object 1. If existing code was testing equality of the class to a certain value, it will break when the new class value is appended. This is described in a post on the R developer blog, when base R was adding a new array class value to matrix objects. Class inheritance should never be tested via class(x) == \"some_class\". Instead, inherits(x, \"some_class\") or is(x, \"some_class\") should be used to future-proof the code and allow appending an additional in the future." + "objectID": "posts/epidemics_v0.1.0/index.html#solving-ode-systems-using-boost-odeint", + "href": "posts/epidemics_v0.1.0/index.html#solving-ode-systems-using-boost-odeint", + "title": "epidemics v0.1.0", + "section": "Solving ODE systems using Boost odeint", + "text": "Solving ODE systems using Boost odeint\nepidemics uses Boost’s odeint https://www.boost.org/doc/libs/1_84_0/libs/numeric/odeint/doc/html/boost_numeric_odeint/getting_started/overview.html to treat the deterministic models’ ordinary differential equations (ODEs) as initial value problems and solve them.\nModel ODEs are defined as structs with operators in the package headers, and exposed to R as internal Rcpp functions. The ‘default’, ‘Vacamole’, and ‘diphtheria’ models are ODE models defined in this way. This is intended to help reduce overheads associated with passing ODE systems written in R back and forth from a solver (such as those provided by {deSolve}), and is an easier way to define feature-rich models than writing C code for solvers provided by {deSolve} that accept compiled code.\nepidemics headers include tools for handling the C++ representations of R objects used in the package (see below), and can be imported by other Rcpp packages.\nThe ‘default’ and ‘Vacamole’ models have equivalent R-only implementations as well which use the {deSolve} package; these are intended to be made unavailable in future releases." }, { - "objectID": "posts/progressive-enhancement/index.html#conclusion", - "href": "posts/progressive-enhancement/index.html#conclusion", - "title": "Improving Ecosystem Interoperability Iteratively via Progressive Enhancement", - "section": "Conclusion", - "text": "Conclusion\nObject oriented programming and S3 classes offer a convenient way to iteratively add interoperability in the ecosystem in a way that is minimally disruptive to users and developers. Newly classed input support can be added via custom methods (after converting the existing function to a generic if necessary). Newly classed output support can be added via progressive enhancement, by ensuring that the new class is a subclass of the old one and that downstream code uses good practices to test class inheritance.\nThanks to James Azam and Tim Taylor for their very valuable feedback on this post." + "objectID": "posts/epidemics_v0.1.0/index.html#composable-elements-as-classes", + "href": "posts/epidemics_v0.1.0/index.html#composable-elements-as-classes", + "title": "epidemics v0.1.0", + "section": "Composable elements as classes", + "text": "Composable elements as classes\nepidemics provides classes that help to organise the components of an epidemic scenario model.\n\n<population>: An S3 class to store population characteristics including the size of demographic groups, a social contacts matrix, and initial conditions for a model;\n<intervention>: An S3 abstract class and super-class that allows the definition of events that modify the epidemic trajectory:\n\n<rate_intervention>: A sub-class of <intervention> that allows the reduction of transition rates between model compartments to simulate the effect of policy interventions over a specific period;\n<contacts_intervention>: A sub-class of <intervention> that allows the reduction of social contacts to simulate the effect of policy interventions over a specific period;\n\n<vaccination>: An S3 class that holds the intervals and group-specific rates at which individuals transition into the ‘vaccinated’ compartment(s) of a model, if available;" }, { - "objectID": "posts/progressive-enhancement/index.html#footnotes", - "href": "posts/progressive-enhancement/index.html#footnotes", - "title": "Improving Ecosystem Interoperability Iteratively via Progressive Enhancement", - "section": "Footnotes", - "text": "Footnotes\n\n\nThis is now enforced in R packages by R CMD check, and via the class_equals_linter() in the lintr package.↩︎" + "objectID": "posts/epidemics_v0.1.0/index.html#other-composable-elements", + "href": "posts/epidemics_v0.1.0/index.html#other-composable-elements", + "title": "epidemics v0.1.0", + "section": "Other composable elements", + "text": "Other composable elements\nepidemics allows models to include elements that affect an epidemic trajectory, but which are not custom classes.\n\nTime-dependence: All models can be passed a list of functions with two arguments, time and x which are expected to return x as a function of time, and which may be used to model the effect of seasonality in model parameters;\nPopulation changes: Applicable only to the diphtheria model, a two element list of time and values, which allow the definition of changes to the number of susceptible individuals in the model, and which may be used to model influxes and evacuations of individuals from humanitarian camps." }, { - "objectID": "posts/epiparameter_v0.1.0/index.html", - "href": "posts/epiparameter_v0.1.0/index.html", - "title": "epiparameter v0.1.0", + "objectID": "posts/epidemics_v0.1.0/index.html#output-processing-functions", + "href": "posts/epidemics_v0.1.0/index.html#output-processing-functions", + "title": "epidemics v0.1.0", + "section": "Output processing functions", + "text": "Output processing functions\nepidemics provides functions to help process the output of an epidemic model run, to calculate the size of the epidemic in each demographic group at any stage (epidemic_size()), and to calculate the number of new infections in each demographic group at each timepoint in the model (new_infections())." + }, + { + "objectID": "posts/epidemics_v0.1.0/index.html#usage-vignettes", + "href": "posts/epidemics_v0.1.0/index.html#usage-vignettes", + "title": "epidemics v0.1.0", + "section": "Usage vignettes", + "text": "Usage vignettes\nepidemics includes a range of usage vignettes that demonstrate how to:\n\nGet started with the package;\nGet started with modelling interventions on social contacts to control outbreaks;\nModel overlapping and sequential interventions on social contacts;\nModel interventions that modify transition rates between model compartments;\nGet started with modelling a vaccination campaign;\nModel time-dependence and seasonality in disease transmission dynamics;\nGenerate and model uncertainty in model parameters;\nReduce the number of parameters required for final size estimation;\nUse the ‘Vacamole’ model for scenarios of leaky vaccination and vaccine impact on hospitalisation;\nUse the ‘Ebola’ model for scenarios of responses to an Ebola virus disease outbreak;\nUse the ‘diphtheria’ model for scenarios of outbreaks in a humanitarian camp setting." + }, + { + "objectID": "posts/epidemics_v0.1.0/index.html#miscellaneous", + "href": "posts/epidemics_v0.1.0/index.html#miscellaneous", + "title": "epidemics v0.1.0", + "section": "Miscellaneous", + "text": "Miscellaneous\n\nWorkflows to render the vignettes and README as a website;\nTest code coverage of 93%." + }, + { + "objectID": "posts/epiparameter_v0.2.0/index.html", + "href": "posts/epiparameter_v0.2.0/index.html", + "title": "epiparameter v0.2.0", "section": "", - "text": "We are very excited to announce the release of a new epiparameter version v0.1.0. Here is an automatically generated summary of the changes in this version.\nInitial release of the {epiparameter} R package. {epiparameter} provides:" + "text": "We are very excited to announce the release of a new epiparameter version v0.2.0. Here is an automatically generated summary of the changes in this version.\nThe second release of the {epiparameter} R package focuses on interoperability with the {epireview} R package. Several functions have been refactored and enhanced.\nThis release benefited from feedback from the participants of an EpiParameter Community workshop hosted by the World Health Organisation." }, { - "objectID": "posts/epiparameter_v0.1.0/index.html#new-features", - "href": "posts/epiparameter_v0.1.0/index.html#new-features", - "title": "epiparameter v0.1.0", + "objectID": "posts/epiparameter_v0.2.0/index.html#new-features", + "href": "posts/epiparameter_v0.2.0/index.html#new-features", + "title": "epiparameter v0.2.0", "section": "New features", - "text": "New features\n\nA library of 122 epidemiological parameter set from the epidemiological literature. This is accessible from the package as system data (sysdata.rda, as epiparameter::multi_epidist) or as internal data (inst/extdata/parameters.json).\nThe epidist_db() function loads epidemiological parameters from the library.\nDistribution parameter conversion and extraction functions (convert_params_to_summary_stats() & convert_summary_stats_to_params(), and extract_param()).\nAn S3 class to work with epidemiological parameters <epidist>. This class has S3 methods to aid users easily work with these data structures. These include printing, plotting, and distribution functions for PDF/PMF, CDF, random number generation and distribution quantiles. The <epidist> class has a constructor function, a validator function, accessors (get_*()), checkers (is_*()). There is also a <vb_epidist> S3 class for vector-borne parameters, and an internal <multi_epidist> class for improved printing of lists of <epidist> objects.\nThe package contains a few utility functions. list_distributions() is a helper function to provide information from a list of <epidist> objects in tabular form. calc_disc_dist_quantile() calculates the quantiles of a probability distribution based on the vector of probabilities and time data.\nFive vignettes are included in this initial release. One as an introduction to the package (epiparameter.Rmd), one as an tutorial on converting and extracting parameters (extract_convert.Rmd), one on the protocol used to collect entries for the library of epidemiological parameters (data_protocol.Rmd), a design vignette (design_principles.Rmd), and a supplementary vignette which quantifies the bias from using the parameter extraction (extract_param()) from {epiparameter} (extract-bias.Rmd).\nUnit tests (98% coverage) and documentation files.\nContinuous integration workflows for R package checks, rendering the README.md, calculating test coverage, deploying the pkgdown website, updates the package citation, linting package code, checking package or system dependency changes, updating copyright year, and validating the parameter library JSON file." + "text": "New features\n\nThe as_epidist() S3 generic has been added to the package for the coercion of other R objects into <epidist> objects. The as_epidist.data.frame() method is added, as well as the internal functions is_epireview() which determines if a <data.frame> is from {epireview}, and epireview_to_epidist() performs the conversion (#298, #334 & #335)\nThe epireview_core_cols.rda data is added to the package. This is used to determine whether the input to as_epidist.data.frame() is a parameter table from {epireview} as these objects do not have a recognisable class attribute (#298).\nA new website only vignette (i.e. article) data_from_epireview.Rmd is added that explains how to use as_epidist() with data from {epireview} (#298 & #335).\nA new vignette database.Rmd is added to the package to provide a web interface to the {epiparameter} library of epidemiological parameters. Contributed by @sbfnk (#311).\nThe plotting method for <epidist> objects (plot.epidist()) has been improved to better differentiate continuous from discrete or discretised distributions (#315).\nepidist_db(..., single_epidist = TRUE) now prioritises parameter entries that account for right truncation (#323).\ncreate_epidist_prob_dist() (previously named create_prob_dist()) is now exported and enables more control of discretisation settings by allowing arguments to be passed to distcrete::distcrete() via ... (#324).\nThe <multi_epidist> print method (print.multi_epidist()) has been improved to provides object information in the print header, the first few elements of the list or all elements if list is short, and some extra links and advice in the print footer. The design of the print method follows the design pattern of {pillar} (#326).\n<epidist> objects and functions that work with <epidist> objects now work with exponential distributions (#333).\nThe package now has an explicit data license: CC0 in the LICENSE file." }, { - "objectID": "posts/epiparameter_v0.1.0/index.html#breaking-changes", - "href": "posts/epiparameter_v0.1.0/index.html#breaking-changes", - "title": "epiparameter v0.1.0", + "objectID": "posts/epiparameter_v0.2.0/index.html#breaking-changes", + "href": "posts/epiparameter_v0.2.0/index.html#breaking-changes", + "title": "epiparameter v0.2.0", "section": "Breaking changes", - "text": "Breaking changes\n\nNone" + "text": "Breaking changes\n\nlist_distributions() has been replaced by parameter_tbl() which enhances the printing by leveraging {pillar} (#321).\nThe <vb_epidist> plotting method (plot.vb_epidist()) has been removed from the package. This provided minimal functionality and was unnecessarily complicating the function signature of plot.epidist() (#315)." }, { - "objectID": "posts/epiparameter_v0.1.0/index.html#bug-fixes", - "href": "posts/epiparameter_v0.1.0/index.html#bug-fixes", - "title": "epiparameter v0.1.0", + "objectID": "posts/epiparameter_v0.2.0/index.html#bug-fixes", + "href": "posts/epiparameter_v0.2.0/index.html#bug-fixes", + "title": "epiparameter v0.2.0", "section": "Bug fixes", - "text": "Bug fixes\n\nNone" + "text": "Bug fixes\n\nDOI and PMID are lowercase throughout the package to resolve issues with older versions of R (see issue #301) (#317)." }, { - "objectID": "posts/epiparameter_v0.1.0/index.html#deprecated-and-defunct", - "href": "posts/epiparameter_v0.1.0/index.html#deprecated-and-defunct", - "title": "epiparameter v0.1.0", + "objectID": "posts/epiparameter_v0.2.0/index.html#deprecated-and-defunct", + "href": "posts/epiparameter_v0.2.0/index.html#deprecated-and-defunct", + "title": "epiparameter v0.2.0", "section": "Deprecated and defunct", "text": "Deprecated and defunct\n\nNone" }, { - "objectID": "posts/epiparameter_v0.1.0/index.html#acknowledgements", - "href": "posts/epiparameter_v0.1.0/index.html#acknowledgements", - "title": "epiparameter v0.1.0", + "objectID": "posts/epiparameter_v0.2.0/index.html#acknowledgements", + "href": "posts/epiparameter_v0.2.0/index.html#acknowledgements", + "title": "epiparameter v0.2.0", "section": "Acknowledgements", - "text": "Acknowledgements\nMany thanks to the contributors to this release, either from issues, code contributions, reviews or discussions (listed in alphabetical order):\n@adamkucharski, @avallecam, @Bisaloo, @CarmenTamayo, @chartgerink, @jamesmbaazam, @maelle, @pratikunterwegs, @pitmonticone, @sbfnk, and @TimTaylor." + "text": "Acknowledgements\n@Bisaloo @CarmenTamayo @DxChen0126 @adamkucharski @chartgerink @chloerice48 @cm401 @jamesmbaazam @jlessler @joshwlambert @papsti @prabasaj @sbfnk @wzmli" }, { - "objectID": "posts/mpox-preparedness/index.html", - "href": "posts/mpox-preparedness/index.html", - "title": "2024 mpox outbreak: common analytics tasks and available R tools", + "objectID": "posts/renv-complications/index.html", + "href": "posts/renv-complications/index.html", + "title": "Things that can go wrong when using renv", "section": "", - "text": "There are ongoing outbreaks of mpox globally. The Democratic Republic of Congo (DRC) is so far the worst hit with a total of 7,851 cases and 384 deaths reported between January 1 and May 26, 2024 1. Before 2022, there were few reports of sustained mpox transmission globally. However, during the following year (Jan 1, 2022, and Jan 29, 2023), 110 countries in all six WHO Regions had reported a total of 85,473 confirmed cases and 89 deaths between them (Laurenson-Schafer et al. 2023).\nMpox is transmitted through respiratory droplets and direct contact with infected persons. The disease is characterized by fever, cough, and a rash, with the mean incubation period estimated to be about 7.8 days (Ward et al. 2022). Infected individuals may experience severe symptoms leading to hospitalisation or death. There are two genetic clades: clade I and clade II, which also has subclades IIa and IIb (Laurenson-Schafer et al. 2023).\nSeveral analyses of the potential impact of outbreaks at country level have already emerged in 2024. The US CDC, for example, has analysed the potential size of outbreaks resulting from transmission within and between households 2 and the risk of Clade 1 mpox outbreaks among some key populations associated with key transmission routes 3. Another group of researchers have estimated the transmissibility of mpox in the DRC from more recent (2010 - 2019) surveillance data to update existing estimates, which are based on old data (Charniga, McCollum, et al. 2024). However, tackling ongoing outbreaks around the world will require a coordinated response from the global health community.\nThe Epiverse-TRACE team is developing a set of analytical tools that could help support decision-makers during outbreaks. This post provides an overview of the tasks that such tools can be applied to in the context of the ongoing mpox outbreaks." + "text": "Throughout the Epiverse project, we use the renv R package to ensure reproducibility of the training materials and the pipelines we are providing. But we sometimes get reports from users who struggle to rebuild the environment and run the code.\nIn this post, we dissect the source of these issues, explain why in reality renv is not at fault, and how this is caused by the inherent complexity of reproducibility. The renv documentation already includes caveats explaining why some situations are bound to require more complex tools. This blog post reiterates some of these caveats and illustrates them with concrete examples.\nFinally, we mention a couple of more complete (but more complex!) frameworks that can overcome the issues presented here. We do not explore these alternative framework in detail but provide links to more information." }, { - "objectID": "posts/mpox-preparedness/index.html#common-outbreak-analytics-tasks", - "href": "posts/mpox-preparedness/index.html#common-outbreak-analytics-tasks", - "title": "2024 mpox outbreak: common analytics tasks and available R tools", - "section": "Common outbreak analytics tasks", - "text": "Common outbreak analytics tasks\nOutbreak analytics in the context of the ongoing mpox outbreak involves several tasks that can be handled by existing and emerging R tools. Some of the tasks include estimating the transmission potential, forecasting infection dynamics, estimating severity, and assessing the impact of interventions.\nHere, we briefly describe some common tasks, data required, and the ready R tools/packages developed by the Epiverse-TRACE team and the wider community.\n\nCleaning and validating data\n\n\n\n\n\n\nData cleaning is often the first task in outbreak analytics. This usually involves identifying and correcting errors in the data, standardizing the format of key variables, and ensuring that the data is in a format that is fit for analysis. Data validation is also important to ensure that the data is accurate.\n\n\n\n{cleanepi} is useful for cleaning individual-level datasets, and {linelist} can be used to tag and validate key variables in datasets that might change over time. The {numberize} package can also be used to convert numbers written as text. It currently has functionality for English, Spanish, and French.\n\n\nEstimating transmission potential\n\n\n\n\n\n\nA key initial question during emerging outbreaks is the transmission potential of the disease. This is typically quantified using parameters such as: the basic reproduction number, \\(R_0\\); the time-varying reproduction number, \\(R_t\\); and \\(k\\), which captures individual heterogeneity in transmission (i.e. “superspreading” potential). These quantities are useful to assess the potential for further spread of the disease and the impact of interventions.\n\n\n\n\nPopulation-level transmissibility (\\(R_0\\) and \\(R_t\\))\nThe basic reproduction number, \\(R_0\\), is the average number of secondary cases produced by a single infected individual in a completely susceptible population. The time-varying reproduction number, \\(R_t\\), on the other hand, is the average number of secondary cases produced by a single infected individual at time \\(t\\) in a partially susceptible population. \\(R_t\\) is a more useful quantity during an outbreak as it accounts for the impact of interventions and changes in population immunity.\nIf data is available on the daily number of reported cases, {EpiNow2} and {EpiEstim} can be used to estimate \\(R_t\\). These packages require data on the time scale of transmission (i.e. the generation time, or the serial interval, which is commonly used as a proxy for this). While {EpiEstim} focuses on retrospective estimation of \\(R_t\\), {EpiNow2} is designed for both retrospective and real-time estimation.\nIn estimating \\(R_t\\), one practical consideration is the impact of various delays (biological and reporting) on the estimates (Charniga, Park, et al. 2024; Park et al. 2024; Katelyn M. Gostic 2020). {EpiNow2} adjusts for these delays in various ways. For example, it accounts for the symptom onset and reporting delays by taking the incubation period and reporting delay as inputs. Moreover, {EpiNow2} can estimate the reporting delay from the data if data on incidence by date of onset and report are available.\nFurthermore, dedicated packages have emerged for estimating epidemiological delays from data using best practices. {epidist} offers the ability to estimate delay distributions, accounting for issues such as truncation (i.e., not all disease outcomes will yet be known in real-time).\nIf delay data are not available, published estimates of the incubation period and serial interval can be used. The {epiparameter} package collates a database of epidemiological distributions from the literature and provides functions for interacting with the database. You can view the database for currently available parameters (more entries are planned). Additionally, if only summary statistics are available (e.g. range and median), {epiparameter} can be used to extract the distribution parameters.\n\n\nIndividual-level transmissibility (superspreading)\n\n\n\n\n\n\nThe individual-level transmission heterogeneity (superspreading), often denoted as \\(k\\), is an important measure for tailoring interventions at the individual level.\n\n\n\nIf we have data on the distribution of sizes of transmission clusters, the {epichains} package provides functions to set up the likelihood function to estimate \\(R_0\\) and \\(k\\). The user inputs the negative binomial offspring, which assumes individuals exhibit heterogeneity in transmission. The parameters of the negative offspring distribution can then be estimated using existing maximum likelihood or bayesian frameworks.\nFurthermore, if we have individual-level transmission chain data, the {superspreading} package can be used to estimate \\(R_0\\) and \\(k\\) from the offspring distribution. This package also provides functions to estimate the probability that an outbreak will not go extinct in its early stages because of randomness in transmission (e.g. if the primary spillover case(s) does not infect others).\nIf we have data on sexual contacts and the secondary attack rate, then we can also use {superspreading} to calculate \\(R_0\\) accounting for network effects.\n\n\n\nForecasting and nowcasting infection dynamics\n\n\n\n\n\n\nForecasting and nowcasting of infections are crucial for planning and resource allocation during an outbreak. Forecasting is the prediction of future cases, deaths, or other outcomes, while nowcasting is the prediction of the current outbreak situation. These predictions can help public health authorities to anticipate the trajectory of the outbreak and to implement timely interventions.\n\n\n\n{EpiNow2} and {epinowcast} provide functions to forecast and nowcast the number of cases. The data required for {EpiNow2} has already been described in the previous section. The {epinowcast} package similarly requires data on the number of cases reported per date. {epinowcast} does not currently support forecasting but there are plans to add this functionality in future versions.\n\n\nEstimating disease severity\n\n\n\n\n\n\nThe case fatality risk (CFR) is often used to assess the severity of a disease. CFR here refers to the proportion of deaths among confirmed cases.\n\n\n\nWith incidence data on the number of cases reported and the number of deaths reported, the {cfr} package can be used to estimate the case fatality rate and its uncertainty. Importantly, it accounts for the delay between the onset of symptoms and death, which is crucial for accurate estimation of the case fatality rate.\nHere again, {EpiNow2} can be used to estimate the time-varying case fatality ratio using the same data as for the reproduction number. {EpiNow2} can estimate other severity metrics, such as the case hospitalisation ratio, given data on cases and hospitalisations, and the hospitalisation fatality ratio, if data on hospitalisations and associated deaths are available.\n\n\nAssessing the impact of interventions\n\n\n\n\n\n\nmpox can be mitigated with behaviour change, treatment, and vaccination. Here, a few tools are available to assess the impact of intervention scenarios.\n\n\n\n{epidemics} provides ready compartmental models to estimate the impact of vaccination and non-pharmaceutical interventions like behaviour change, which can conceptually be modelled as a reduction in the transmission rate through changes in the population contact structure.\nIf we want to explore population-level outbreak dynamics, {epidemics} allows for stratifying the population into arbitrary groups, specifying the contact structure between these groups, and rates of interventions. The data required to run these models include: population structure, contact structure, and timing and magnitude of interventions. Data on social contact matrices can be obtained from the {socialmixr} package." + "objectID": "posts/renv-complications/index.html#binaries-vs-building-from-source", + "href": "posts/renv-complications/index.html#binaries-vs-building-from-source", + "title": "Things that can go wrong when using renv", + "section": "Binaries vs building from source", + "text": "Binaries vs building from source\nSoftware, including R packages, can generally be delivered in two forms: as binaries or as source code. If you are building from the source code, you may in some case need a compilation toolchain on your computer. If that toolchain is missing, it can lead to errors such as:\n\nld: warning: search path '/opt/gfortran/lib' not found\nld: library 'gfortran' not found\n\nMost of the time, regular users of R will not see these errors because they are installing binaries. Indeed, CRAN provides pre-compiled binaries for Windows and macOS for the last version of the package and R.\nWith renv, you often want to install older versions of the packages, which won’t be available as binaries from CRAN. This means you are more likely to have to compile the package yourself and see this kind of errors, even though renv is not causing them.\n\n\n\n\n\n\ngfortran issues on Apple Silicon computers\n\n\n\nIf you are an Apple Silicon (Mac M1, M2, M3) user and encounter issues with gfortran, we have had success using the macrtools R package and we strongly recommend checking it out." }, { - "objectID": "posts/mpox-preparedness/index.html#summary", - "href": "posts/mpox-preparedness/index.html#summary", - "title": "2024 mpox outbreak: common analytics tasks and available R tools", - "section": "Summary", - "text": "Summary\nIn this post, we have outlined common outbreak analytics tasks relevant to the mpox outbreak, the data required, and R packages/tools that are currently available to facilitate these tasks. The tools described here are being developed by the Epiverse-TRACE team and the wider community, with the aim of ensuring high standards of research software development, and validation from end users, including epidemiologists, clinicians, and policy makers. The tools are designed to be user-friendly and well integrated, enabling one analysis task to easily feed into another. We would therefore be keen to hear from other groups interested in potentially collaborating or contributing on this growing ecosystem of tools.\nThanks to Karim Mane and Chris Hartgerink for their valuable comments on earlier drafts of this post." + "objectID": "posts/renv-complications/index.html#beyond-renv-scope-incompatibility-with-system-dependency-versions", + "href": "posts/renv-complications/index.html#beyond-renv-scope-incompatibility-with-system-dependency-versions", + "title": "Things that can go wrong when using renv", + "section": "Beyond renv scope: incompatibility with system dependency versions", + "text": "Beyond renv scope: incompatibility with system dependency versions\nWe discussed previously the topic of system dependencies, and dependencies on specific R versions. These special dependencies can also be a source of headaches when using renv.\nThe heart of the issue is that renv provides a simplified solution to reproducibility: it focuses on R packages and their versions. But other sources of non-reproducibility are outside its scope. In many cases, this will not be a problem, as the main source of non-reproducibility, especially in the relatively short-term, will be R package versions.\nBut sometimes, it is possible that the renv.lock lockfile requires such an old version of an R package that it was written with a syntax that is no longer supported by recent R versions or modern compilers.\nFor example, a recent project (from 2023) was trying to install the version 0.60.1 of the matrixStats package (from 2021). This lead to this compilation error:\n\nerror: ‘DOUBLE_XMAX’ undeclared (first use in this function); did you mean ‘DBL_MAX’?\n\n\n\n\nClick to see the full error message\n\n! Error installing package 'matrixStats':\n=======================================\n\n* installing *source* package ‘matrixStats’ ...\n** package ‘matrixStats’ successfully unpacked and MD5 sums checked\n** using staged installation\n** libs\nusing C compiler: ‘gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0’\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c 000.init.c -o 000.init.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c allocMatrix2.c -o allocMatrix2.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c anyMissing.c -o anyMissing.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c binCounts.c -o binCounts.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c binMeans.c -o binMeans.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c colCounts.c -o colCounts.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c colOrderStats.c -o colOrderStats.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c colRanges.c -o colRanges.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c diff2.c -o diff2.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c indexByRow.c -o indexByRow.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c logSumExp.c -o logSumExp.o\ngcc -I\"/usr/share/R/include\" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c mean2.c -o mean2.o\nIn file included from mean2_lowlevel.h:14,\n from mean2.c:9:\nmean2_lowlevel_template.h: In function ‘mean2_int’:\nmean2_lowlevel_template.h:59:13: error: ‘DOUBLE_XMAX’ undeclared (first use in this function); did you mean ‘DBL_MAX’?\n 59 | if (sum > DOUBLE_XMAX) {\n | ^~~~~~~~~~~\n | DBL_MAX\nmean2_lowlevel_template.h:59:13: note: each undeclared identifier is reported only once for each function it appears in\nIn file included from mean2_lowlevel.h:18,\n from mean2.c:9:\nmean2_lowlevel_template.h: In function ‘mean2_dbl’:\nmean2_lowlevel_template.h:59:13: error: ‘DOUBLE_XMAX’ undeclared (first use in this function); did you mean ‘DBL_MAX’?\n 59 | if (sum > DOUBLE_XMAX) {\n | ^~~~~~~~~~~\n | DBL_MAX\nmake: *** [/usr/lib/R/etc/Makeconf:191: mean2.o] Error 1\nERROR: compilation failed for package ‘matrixStats’\n\nThe explanation for this error can be found in the matrixStats release notes, specifically the section for matrixStats 0.63.0:\n\n\nUpdated native code to use the C99 constant DBL_MAX instead of legacy S constant DOUBLE_XMAX, which is planned to be unsupported in R (>= 4.2.0)." }, { - "objectID": "posts/mpox-preparedness/index.html#footnotes", - "href": "posts/mpox-preparedness/index.html#footnotes", - "title": "2024 mpox outbreak: common analytics tasks and available R tools", - "section": "Footnotes", - "text": "Footnotes\n\n\nWHO Disease Outbreak News↩︎\nModeling Household Transmission of Clade I Mpox in the United States↩︎\nRisk of Clade 1 Mpox Outbreaks Among Gay, Bisexual, and Other Men Who Have Sex With Men in the United States↩︎" + "objectID": "posts/renv-complications/index.html#some-solutions", + "href": "posts/renv-complications/index.html#some-solutions", + "title": "Things that can go wrong when using renv", + "section": "Some solutions", + "text": "Some solutions\n\nAlternative package managers\nWe discussed how many issues when using renv can arise during the package compilation from source. A potential solution would be to avoid this compilation step and always install pre-compiled binaries.\nThis is not possible while installing from CRAN as CRAN only provides binaries for recent versions of R and for a limited number of platforms.\nBut Posit for example provides a larger collection of binaries, for different package versions, and different platforms, via their Public Posit Package Manager (PPM).\nMaking sure you install from PPM rather than CRAN can be a first simple step to make some of the issues discussed here vanish.\n\n\nExtending the scope of reproducibility\nAnother solution could be to add more complex reproducibility solutions that go beyond the scope of renv.\n\nrenv with rig\nThe R version is specified in renv.lock and to avoid incompatibility of older package versions with newer versions of R, you could run the declared R version. This can be achieved with various means but a convenient solution is the rig tool.\nThere are even some discussions to integrate rig and renv more tightly and let rig detect automatically which R version to use based on the renv.lock file.\n\n\nDocker, Nix and others\nAlternatively, you could use other reproducibility toolkits that focus not just on the R package versions, but on the entire software stack (e.g., including the operating system, the system dependencies). These solutions can be more complex to set up and use, and we won’t detail them in this blog post but you can find more information in:\n\nThe “Using renv with Docker” renv vignette\nthe “An Introduction to Rocker: Docker Containers for R” paper\nBruno Rodrigues’ entire series of blog posts on Nix\n\n\n\n\nConclusion: a final note for developers\nrenv is an elegant solution that focuses on the most immediate source of non-reproducibility. This however means it needs to be complemented by other tools in more complex cases.\nUltimately, reproducibility is a team effort. People who write code can minimise the risk of renv complications by keeping the packages they use close to their CRAN version and regularly updating their code and renv.lock accordingly. Other programming languages have automated tooling to help with this, via, e.g., the dependabot tool which submits pull requests to update dependencies. There is no well established equivalent for R yet, but anyone willing to set this mechanism up can look at the code used by the Carpentries workbench for this task.\nThanks to Pratik Gupte and Chris Hartgerink for their valuable comments on earlier drafts of this post." }, { - "objectID": "posts/system-dependencies/index.html", - "href": "posts/system-dependencies/index.html", - "title": "System Dependencies in R Packages & Automatic Testing", + "objectID": "posts/share-cpp/index.html", + "href": "posts/share-cpp/index.html", + "title": "Sharing the C++ Code of an Rcpp Package", "section": "", - "text": "This post has been cross-posted on the R-hub blog, and the R-hub blog maintainers have contributed to the review and improvement of this post.\nIn a previous R-hub blog post, we discussed a package dependency that goes slightly beyond the normal R package ecosystem dependency: R itself. Today, we step even further and discuss dependencies outside of R: system dependencies. This happens when packages rely on external software, such as how R packages integrating CUDA GPU computation in R require the CUDA library. In particular, we are going to talk about system dependencies in the context of automated testing: is there anything extra to do when setting continuous integration for your package with system dependencies? In particular, we will focus with the integration with GitHub Actions. How does it work behind the scenes? And how to work with edge cases?" - }, - { - "objectID": "posts/system-dependencies/index.html#introduction-specifying-system-dependencies-in-r-packages", - "href": "posts/system-dependencies/index.html#introduction-specifying-system-dependencies-in-r-packages", - "title": "System Dependencies in R Packages & Automatic Testing", - "section": "Introduction: specifying system dependencies in R packages", - "text": "Introduction: specifying system dependencies in R packages\nBefore jumping right into the topic of continuous integration, let’s take a moment to introduce, or remind you, how system dependencies are specified in R packages.\nThe official ‘Writing R Extensions’ guide states 1:\n\nDependencies external to the R system should be listed in the ‘SystemRequirements’ field, possibly amplified in a separate README file.\n\nThis was initially purely designed for humans. No system within R itself makes use of it. One important thing to note is that this field contains free text :scream:. As such, to refer to the same piece of software, you could write either one of the following in the package DESCRIPTION:\nSystemRequirements: ExternalSoftware\nSystemRequirements: ExternalSoftware 0.1\nSystemRequirements: lib-externalsoftware\nHowever, it is probably good practice check what other R packages with similar system dependencies are writing in SystemRequirements, to facilitate the automated identification process we describe below." - }, - { - "objectID": "posts/system-dependencies/index.html#the-general-case-everything-works-automagically", - "href": "posts/system-dependencies/index.html#the-general-case-everything-works-automagically", - "title": "System Dependencies in R Packages & Automatic Testing", - "section": "The general case: everything works automagically", - "text": "The general case: everything works automagically\nIf while reading the previous section, you could already sense the problems linked to the fact SystemRequirements is a free-text field, fret not! In the very large majority of cases, setting up continuous integration in an R package with system dependencies is exactly the same as with any other R package.\nUsing, as often, the supercharged usethis package, you can automatically create the relevant GitHub Actions workflow file in your project 2:\n\nusethis::use_github_action(\"check-standard\")\n\nThe result is:\n# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples\n# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help\non:\n push:\n branches: [main, master]\n pull_request:\n branches: [main, master]\n\nname: R-CMD-check\n\njobs:\n R-CMD-check:\n runs-on: ${{ matrix.config.os }}\n\n name: ${{ matrix.config.os }} (${{ matrix.config.r }})\n\n strategy:\n fail-fast: false\n matrix:\n config:\n - {os: macos-latest, r: 'release'}\n - {os: windows-latest, r: 'release'}\n - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}\n - {os: ubuntu-latest, r: 'release'}\n - {os: ubuntu-latest, r: 'oldrel-1'}\n\n env:\n GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}\n R_KEEP_PKG_SOURCE: yes\n\n steps:\n - uses: actions/checkout@v3\n\n - uses: r-lib/actions/setup-pandoc@v2\n\n - uses: r-lib/actions/setup-r@v2\n with:\n r-version: ${{ matrix.config.r }}\n http-user-agent: ${{ matrix.config.http-user-agent }}\n use-public-rspm: true\n\n - uses: r-lib/actions/setup-r-dependencies@v2\n with:\n extra-packages: any::rcmdcheck\n needs: check\n\n - uses: r-lib/actions/check-r-package@v2\n with:\n upload-snapshots: true\nYou may notice there is no explicit mention of system dependencies in this file. Yet, if we use this workflow in an R package with system dependencies, everything will work out-of-the-box in most cases. So, when are system dependencies installed? And how the workflow does even know which dependencies to install since the SystemRequirements is free text that may not correspond to the exact name of a library?\nThe magic happens in the r-lib/actions/setup-r-dependencies step. If you want to learn about it, you can read the source code of this step. It is mostly written in R but it contains a lot of bells and whistles to handle messaging within the GitHub Actions context and as such, it would be too long to go through it line by line in this post. However, at a glance, you can notice many mentions of the pak R package.\nIf it’s the first time you’re hearing about the pak package, we strongly recommend we go through the list of the most important pak features. It is paked packed with many very powerful features. The specific feature we’re interested in here is the automatic install of system dependencies via pak::pkg_sysreqs(), which in turn uses pkgdepends::sysreqs_install_plan().\nWe now understand more precisely where the magic happens but it still doesn’t explain how pak is able to know which precise piece of software to install from the free text SystemRequirements field. As often when you want to increase your understanding, it is helpful to read the source. While browsing pkgdepends source code, we see a call to https://github.com/r-hub/r-system-requirements.\nThis repository contains a set of rules as json files which match unformatted software name via regular expressions to the exact libraries for each major operating system. Let’s walk through an example together:\n{\n \"patterns\": [\"\\\\bnvcc\\\\b\", \"\\\\bcuda\\\\b\"],\n \"dependencies\": [\n {\n \"packages\": [\"nvidia-cuda-dev\"],\n \"constraints\": [\n {\n \"os\": \"linux\",\n \"distribution\": \"ubuntu\"\n }\n ]\n }\n ]\n}\nThe regular expression tells that each time a package lists something as SystemRequirements with the word “nvcc” or “cuda”, the corresponding Ubuntu library to install is nvidia-cuda-dev.\nThis interaction between r-system-requirements and pak is also documented in pak’s dev version, with extra information about how the SystemRequirements field is extracted in different situations: https://pak.r-lib.org/dev/reference/sysreqs.html#how-it-works" - }, - { - "objectID": "posts/system-dependencies/index.html#when-its-not-working-out-of-the-box", - "href": "posts/system-dependencies/index.html#when-its-not-working-out-of-the-box", - "title": "System Dependencies in R Packages & Automatic Testing", - "section": "When it’s not working out-of-the-box", - "text": "When it’s not working out-of-the-box\nWe are now realizing that this automagical setup we didn’t pay so much attention to until now actually requires a very heavy machinery under the hood. And it happens, very rarely, that this complex machinery is not able to handle your specific use case. But it doesn’t mean that you cannot use continuous integration in your package. It means that some extra steps might be required to do so. Let’s review these possible solutions together in order of complexity.\n\nFix it for everybody by submitting a pull request\nOne first option might be that the regular expression used by r-system-requirements to convert the free text in SystemRequirements to a library distributed by your operating system does not recognize what is in SystemRequirements.\nTo identify if this is the case, you need to find the file containing the specific rule for the system dependency of interest in r-system-requirements, and test the regular expression on the contents of SystemRequirements.\nIf we re-use the cuda example from the previous section and we are wondering why it is not automatically installed for a package specifying “cudaa”:\n\nstringr::str_match(\"cudaa\", c(\"\\\\bnvcc\\\\b\", \"\\\\bcuda\\\\b\"))\n\n [,1]\n[1,] NA \n[2,] NA \n\n\nThis test confirms that the SystemRequirements field contents are not recognized by the regular expression. Depending on the case, the best course of action might be to:\n\neither edit the contents of SystemRequirements so that it’s picked up by the regular expression\nor submit a pull request to rstudio/r-system-requirements 3 if you believe the regular expression is too restrictive and should be updated (example)\n\nNote however that the first option is likely always the simplest as it doesn’t impact all the rest of the ecosystem (which is why r-system-requirements maintainers might be reluctant to relax a regular expression) and it is often something directly in your control, rather than a third-party who might not immediately be available to review your PR.\n\n\nInstall system dependencies “manually”\nHowever, you might be in a case where you cannot rely on the automated approach. For example, maybe the system dependency to install is not provided by package managers at all. Typically, if you had to compile or install it manually on your local computer, you’re very likely to have to do the same operation in GitHub Actions. There two different, but somewhat equivalent, ways to do so, as detailed below.\n\nDirectly in the GitHub Actions workflow\nYou can insert the installation steps you used locally in the GitHub Actions workflow file. So, instead of having the usual structure, you have an extra step “Install extra system dependencies manually” that may look something like this:\njobs:\n R-CMD-check:\n runs-on: ubuntu-latest\n env:\n GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}\n R_KEEP_PKG_SOURCE: yes\n steps:\n - uses: actions/checkout@v3\n\n - uses: r-lib/actions/setup-r@v2\n with:\n use-public-rspm: true\n\n+ - name: Install extra system dependencies manually\n+ run:\n+ wget ...\n+ make\n+ sudo make install\n\n - uses: r-lib/actions/setup-r-dependencies@v2\n with:\n extra-packages: any::rcmdcheck\n needs: check\n\n - uses: r-lib/actions/check-r-package@v2\nYou can see a real-life example in the rbi R package.\n\n\nUsing a Docker image in GitHub Actions\nAlternatively, you can do the manual installation in a Docker image and use this image in your GitHub Actions workflow. This is a particularly good solution if there is already a public Docker image or you already wrote a DOCKERFILE for your own local development purposes. If you use a public image, you can follow the steps in the official documentation to integrate it to your GitHub Actions job. If you use a DOCKERFILE, you can follow the answers to this stackoverflow question (in a nutshell, use docker compose in your job or publish the image first and then follow the official documentation).\njobs:\n R-CMD-check:\n runs-on: ubuntu-latest\n+ container: ghcr.io/org/repo:main\n env:\n GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}\n R_KEEP_PKG_SOURCE: yes\n steps:\n - uses: actions/checkout@v3\n\n - uses: r-lib/actions/setup-r@v2\n with:\n use-public-rspm: true\n\n - uses: r-lib/actions/setup-r-dependencies@v2\n with:\n extra-packages: any::rcmdcheck\n needs: check\n\n - uses: r-lib/actions/check-r-package@v2\nYou can again see a real-life example in the rbi R package." - }, - { - "objectID": "posts/system-dependencies/index.html#conclusion", - "href": "posts/system-dependencies/index.html#conclusion", - "title": "System Dependencies in R Packages & Automatic Testing", - "section": "Conclusion", - "text": "Conclusion\nIn this post, we have provided an overview of how to specify system requirements for R package, how this seemingly innocent task requires a very complex infrastructure so that it can be understood by automated tools and that your dependencies are smoothly installed in a single command. We also gave some pointers on what to do if you’re in one of the rare cases where the automated tools don’t or can’t work.\nOne final note on this topic is that there might be a move from CRAN to start requiring more standardization in the SystemRequirements field. One R package developer has reported being asked to change “Java JRE 8 or higher” to “Java (>= 8)”.\nMany thanks to Maëlle Salmon & Gábor Csárdi for their insights into this topic and their valuable feedback on this post." - }, - { - "objectID": "posts/system-dependencies/index.html#footnotes", - "href": "posts/system-dependencies/index.html#footnotes", - "title": "System Dependencies in R Packages & Automatic Testing", - "section": "Footnotes", - "text": "Footnotes\n\n\nFor R history fans, this has been the case since R 1.7.0, released in April 2003.↩︎\nAlternatively, if you’re not using usethis, you can manually copy-paste the relevant GitHub Actions workflow file from the examples of the r-lib/actions project.↩︎\nIf you are wondering why we are saying to submit PR to rstudio/r-system-requirements when we were previously talking about r-hub/r-system-requirements, you can check out this comment thread.↩︎" + "text": "Using the {Rcpp} package is the dominant method for linking the usability of R with the speed of C++, and can be used to write R packages that are fast and easy to use for both end-users and developers.\nFrom the point of view of developers, it’s very easy to export R code such as functions and classes from an R(cpp) package, but the guidance in the Rcpp documentation does not detail how to export the C++ code so that it can be shared with your other Rcpp packages.\nAllowing C++ code to be shared can be very beneficial for the same reasons that sharing R code is — packaging code is a reliable way to reuse it.\nSome widely used examples of this practice are the {RcppEigen}, {RcppArmadillo}, {RcppGSL}, and Boost Headers {BH} packages. Indeed, in the Epiverse-TRACE team, {RcppEigen} underpins the {finalsize} and upcoming {epidemics} packages." }, { - "objectID": "posts/linelist_v1.1.0/index.html", - "href": "posts/linelist_v1.1.0/index.html", - "title": "linelist v1.1.0", + "objectID": "posts/share-cpp/index.html#why-share-c-code-from-an-rcpp-package", + "href": "posts/share-cpp/index.html#why-share-c-code-from-an-rcpp-package", + "title": "Sharing the C++ Code of an Rcpp Package", "section": "", - "text": "We are very excited to announce the release of a new linelist version v1.1.0. Here is an automatically generated summary of the changes in this version." + "text": "Using the {Rcpp} package is the dominant method for linking the usability of R with the speed of C++, and can be used to write R packages that are fast and easy to use for both end-users and developers.\nFrom the point of view of developers, it’s very easy to export R code such as functions and classes from an R(cpp) package, but the guidance in the Rcpp documentation does not detail how to export the C++ code so that it can be shared with your other Rcpp packages.\nAllowing C++ code to be shared can be very beneficial for the same reasons that sharing R code is — packaging code is a reliable way to reuse it.\nSome widely used examples of this practice are the {RcppEigen}, {RcppArmadillo}, {RcppGSL}, and Boost Headers {BH} packages. Indeed, in the Epiverse-TRACE team, {RcppEigen} underpins the {finalsize} and upcoming {epidemics} packages." }, { - "objectID": "posts/linelist_v1.1.0/index.html#breaking-changes", - "href": "posts/linelist_v1.1.0/index.html#breaking-changes", - "title": "linelist v1.1.0", - "section": "Breaking changes", - "text": "Breaking changes\n\nmake_linelist() and set_tags() no longer accept a named list of characters as input. Instead, make_linelist(), set_tags() and tags_types() now use rlang’s dynamic dots, which means that you can splice list arguments. This implementation is more robust, better tested, and makes it explicit that users want to splice the list (@Bisaloo, #96).\n\nBefore:\n\nmy_tags <- list(\n id = \"case_ID\",\n date_onset = \"date_of_prodrome\",\n age = \"age\",\n gender = \"gender\"\n)\nmake_linelist(obj, my_tags)\n# OR\nmake_linelist(\n obj,\n id = \"case_ID\",\n date_onset = \"date_of_prodrome\",\n age = \"age\",\n gender = \"gender\"\n)\n\nNow:\n\nmy_tags <- list(\n id = \"case_ID\",\n date_onset = \"date_of_prodrome\",\n age = \"age\",\n gender = \"gender\"\n)\nmake_linelist(obj, !!!my_tags)\n# OR\nmake_linelist(\n obj,\n id = \"case_ID\",\n date_onset = \"date_of_prodrome\",\n age = \"age\",\n gender = \"gender\"\n)" + "objectID": "posts/share-cpp/index.html#two-ways-to-share-c-code", + "href": "posts/share-cpp/index.html#two-ways-to-share-c-code", + "title": "Sharing the C++ Code of an Rcpp Package", + "section": "Two ways to share C++ code?", + "text": "Two ways to share C++ code?\nDevelopers searching for a way to make the C++ code of their Rcpp-based packages shareable will likely find two seemingly distinct ways of doing so.\n\nDevelopers reading the Rcpp Attributes documentation will find that package C++ code can be shared by generating a C++ interface for functions that also have an R interface, using Rcpp attributes.\nDevelopers instead scanning widely used Rcpp packages such as {RcppEigen} will notice that C++ code can also be shared by defining the majority of the C++ functions in a package header, to which other Rcpp packages can be linked.\n\nThese are simply different pathways to the writing and export of an R(cpp) package header, which allows Rcpp to link the package’s C++ code to other packages.\nThis blog post explores different ways of doing this, and explains how the Rcpp packages from Epiverse-TRACE implement C++ code sharing." }, { - "objectID": "posts/linelist_v1.1.0/index.html#new-features", - "href": "posts/linelist_v1.1.0/index.html#new-features", - "title": "linelist v1.1.0", - "section": "New features", - "text": "New features\n\nlinelist warnings and errors in the case of a tag loss now have a custom class (@Bisaloo, #109), which means it is easier to silence them specifically, or to catch them programmatically for advanced error handling. One example of a new advanced condition handling that was before not possible is:\nwarning_counter <- 0\n\nwithCallingHandlers({\n x <- linelist::make_linelist(cars, date_onset = \"dist\", age = \"speed\")\n x <- x[, -1]\n x <- x[, -1]\n warning(\"This is not a linelist warning\", call. = FALSE)\n}, linelist_warning = function(w) {\n warning_counter <<- warning_counter + 1\n})\n#> Warning: The following tags have lost their variable:\n#> age:speed\n#> Warning: The following tags have lost their variable:\n#> date_onset:dist\n#> Warning: This is not a linelist warning\n\nwarning(\"This pipeline generated \", warning_counter, \" linelist warnings.\")\n#> Warning: This pipeline generated 2 linelist warnings.\nlinelist objects now have a new custom $<-.linelist() to prevent tag loss when subsetting a linelist object (@Bisaloo, #86). This completes the functionality already provided by the [<-.linelist() and [[<-.linelist() methods.\nx$tagged_column <- NULL\n#> Warning in prune_tags(out, lost_action): The following tags have lost their variable:\n#> tag:tagged_column\nValidation failures in validate_types() now integrate a delayed error mechanism (@Bisaloo, #106). This ensures that the error message will return all the invalid tag types at once rather than having to go through multiple trials and runs.\n\nBefore: only the first invalid tag type is returned.\n\n# No warning about age, even though it also has an invalid type\nx <- make_linelist(cars, age = \"speed\", gender = \"dist\")\nvalidate_types(x, ref_types = tags_types(age = \"factor\"))\n#> Error in validate_types(x, ref_types = tags_types(age = \"factor\")) : \n#> Issue when checking class of tag `gender`:\n#> Must inherit from class 'character'/'factor', but has class 'numeric'\n\nNow: the error message returns all the invalid tag types at once.\n\nx <- make_linelist(cars, age = \"speed\", gender = \"dist\")\nvalidate_types(x, ref_types = tags_types(age = \"factor\"))\n#> Some tags have the wrong class:\n#> - gender: Must inherit from class 'character'/'factor', but has class 'numeric'\n#> - age: Must inherit from class 'factor', but has class 'numeric'" + "objectID": "posts/share-cpp/index.html#the-package-header", + "href": "posts/share-cpp/index.html#the-package-header", + "title": "Sharing the C++ Code of an Rcpp Package", + "section": "The package header", + "text": "The package header\nThe package header of the package {mypackage} is a file of the name mypackage.h under inst/include. Defining this header is the key step in making (R)C++ code shareable.\n# conceptual organisation of an Rcpp package with a package header\n.\n├── DESCRIPTION\n├── NAMESPACE\n├── R\n│   └── RcppExports.R\n├── inst\n│   └── include\n│   └── mypackage.h # <= the package header\n└── src\n ├── RcppExports.cpp\n └── rcpp_hello.cpp # <= code from which RcppExports.cpp generates\n\nAutogenerating the package header\nThe package header is autogenerated when the attributes of an Rcpp function are edited to also generate a C++ interface.\nConsider the Rcpp function below which is exposed to R and exported from the package. The line // [[Rcpp::interfaces(cpp)]] instructs Rcpp to autogenerate two header files under inst/include:\n\nA package header, called mypackage.h, and\nA helper header called mypackage_RcppExports.h with ‘automagic’ C++ bindings for the function hello_world_rcpp().\n\n\n\nsrc/rcpp_hello.cpp\n\n#include <Rcpp.h>\n\n// [[Rcpp::interfaces(cpp)]]\n\n//' @title Test Rcpp function\n//' \n//' @export\n//[[Rcpp::export]]\nvoid hello_world_rcpp() {\n Rcpp::Rcout << \"hello world!\\n\";\n}\n\n\n\nManually creating the package header\nThe package header can also be created manually, as mypackage.h under inst/include. In this case, the helper file mypackage_RcppExports.h is not generated.\nExamples of this are the widely used {RcppEigen} and {RcppArmadillo} packages, while this demonstration package by James Balamuta is a minimal example that is a good place to get started to understand how this approach works.\nThe manually defined package header can initially be empty, and is populated by the developer — more on header contents below.\n\n\n\n\n\n\nWarning\n\n\n\nIt is possible to edit an autogenerated package header to include manually created header files in addition to mypackage_RcppExports.h. To do this, remove the generator tag (see below) to prevent this file from being overwritten by Rcpp::compileAttributes(). Then include any extra header files as usual.\nWe would however recommend not autogenerating headers from Rcpp functions, but rather writing a header-heavy package — this is the approach used by {RcppEigen} etc. (see more below on how we organise our packages).\n\n\n\n\nContents of the package header\nWe found it difficult to get information on the content of the package header.\nAutogenerated package headers contain an autogeneration message and a generator token, similar to that present in RcppExports files. Package headers should contain a header include guard.\n\n\n\n\n\n\nTip\n\n\n\nThe style of the header name in the include guard for autogenerated headers is RCPP_mypackage_H_GEN_. Package headers from the Rcpp core team, such as {RcppEigen} and {RcppArmadillo}, are manually defined and follow the convention mypackage__mypackage__h. In examples, such as this bare-bones demonstration package by James Balamuta, you might also encounter a single underscore (_) and a capital H (mypackage_mypackage_H).\nIf you are linting your Rcpp package’s C++ code with Cpplint, all three are incompatible with Cpplint’s preference, which is DIR_SUBDIR_FILE_H. Exclude the package header from linting to avoid this warning if you wish to follow an Rcpp community style instead.\n\n\nThe package header must also link to the code you want to export, and there are at least three ways of doing this.\n\nInclude the autogenerated file mypackage_RcppExports.h; this is already done as part of the package header generation.\nDirectly write C++ code in the package header. This is technically possible, but unlikely to be a good option as your package’s C++ codebase grows.\nManually include any other C++ header files in the package header. This last option might lead to a package header such as that shown below.\n\n\n\ninst/include/mypackage.h\n\n// Manually created package header with manual code inclusion\n#ifndef mypackage_mypackage_H\n#define mypackage_mypackage_H\n\n// include files using paths relative to inst/include\n#include \"header_01.h\"\n#include \"header_02.h\"\n\n#endif // mypackage_mypackage_H\n\nHere, the header files might contain code that you wish to make available to other packages, such as a C++ function, struct, or class, and indeed in the current package as well — more on how to do this below." }, { - "objectID": "posts/linelist_v1.1.0/index.html#internal-changes", - "href": "posts/linelist_v1.1.0/index.html#internal-changes", - "title": "linelist v1.1.0", - "section": "Internal changes", - "text": "Internal changes\n\nInternal duplication in the specification of the tags supported by linelist by default has been removed. This makes it easier to add or remove tags in the future, and reduces the risk of inconsistencies between the different parts of the package (@Bisaloo, #111).\nThe internal tag_variable() function has been replace by a vectorized alternative tag_variable, thus improving performance in make_linelist() and set_tags() about twofold. The error message when tags are specified by position with a number larger than the number of columns in the dataset to tag has also been clarified (@Bisaloo, #110)." + "objectID": "posts/share-cpp/index.html#using-rcpp-in-header-code", + "href": "posts/share-cpp/index.html#using-rcpp-in-header-code", + "title": "Sharing the C++ Code of an Rcpp Package", + "section": "Using Rcpp in header code", + "text": "Using Rcpp in header code\nUsing {Rcpp}’s C++ functionality, such as the Rcpp classes DataFrame or List, or classes and functions of Rcpp-based packages such as {RcppEigen}, is as simple as including those headers in the appropriate location, just as one would in a source file — see the example below.\n\n\ninst/include/header_01.h\n\n// In a manually created header file, say, header_01.h\n// which is included in mypackage.h\n\n// to use Rcpp\n#include <Rcpp.h>\n\n// note the use of inline, more on this later\ninline void hello_world_rcpp() {\n Rcpp::Rcout << \"hello world!\\n\";\n}\n\nThe appropriate headers are automatically included in autogenerated package headers’ helper files, and the developer need not do anything more.\n\n\n\n\n\n\nTip\n\n\n\nDon’t forget to link to {Rcpp} or similar packages to the package under development by adding the package names under Imports, Depends, or LinkingTo as appropriate.\nThis can often be handled by functions in the {usethis} package such as usethis::use_rcpp_eigen(). You might also need to add // [[Rcpp::depends(<package>)]] in your package’s C++ source files, with a suitable package dependency specified.\n\n\nThe same principles apply to using C++ code from this package ({mypackage}) in future packages." }, { - "objectID": "posts/linelist_v1.1.0/index.html#documentation", - "href": "posts/linelist_v1.1.0/index.html#documentation", - "title": "linelist v1.1.0", - "section": "Documentation", - "text": "Documentation\n\nlinelist now provides a design vignette for future contributors or maintainers (@Bisaloo, #112)." + "objectID": "posts/share-cpp/index.html#using-header-code-in-the-package", + "href": "posts/share-cpp/index.html#using-header-code-in-the-package", + "title": "Sharing the C++ Code of an Rcpp Package", + "section": "Using header code in the package", + "text": "Using header code in the package\nThere are some considerations when seeking to use header code from {mypackage} within {mypackage} itself.\nAny functions defined in the package headers must be inline functions (see the example above). This prevents compilation errors related to multiple definitions.\nC++ source files should include the package header, using #include mypackage.h. Functions, structs, or classes defined in header files will be available from the namespace mypackage, as shown in the example below.\nThe code in header files will usually need to be wrapped in (R)C++ code that is exposed to R to make functions from the headers available in R — see the snippet below.\n\n\nmypackage/src/hello_world.cpp\n\n// #include <Rcpp.h> // include Rcpp if necessary\n#include <mypackage.h> // include package header\n\n// Function exposed to R\n//' @title Rcpp function wrapping a header function\n//'\n//' @export\n// [[Rcpp::export]]\nvoid print_hello_world() {\n mypackage::hello_world_rcpp(); // note the namespacing\n}\n\n\n\n\n\n\n\nTip\n\n\n\nRemember to add PKG_CPPFLAGS += -I../inst/include/ to both Makevars and Makevars.win under src/. Furthermore, as noted in the Rcpp attributes documentation, the package will not automatically cause a rebuild when headers are modified — this needs to be done manually." }, { - "objectID": "posts/epichains_v0.1.0/index.html", - "href": "posts/epichains_v0.1.0/index.html", - "title": "epichains v0.1.0", - "section": "", - "text": "We are very excited to announce the release of a new epichains version v0.1.0. Here is an automatically generated summary of the changes in this version." + "objectID": "posts/share-cpp/index.html#linking-header-code-between-pacakges", + "href": "posts/share-cpp/index.html#linking-header-code-between-pacakges", + "title": "Sharing the C++ Code of an Rcpp Package", + "section": "Linking header code between pacakges", + "text": "Linking header code between pacakges\nOnce you have developed your package, you can link to its C++ header code in the same way as you would to any other Rcpp-based package.\nConsider the snippet below which shows how to link the C++ code from {mypackage} in a different package called {yourpackage}.\n\n\nyourpackage/src/hello_world.cpp\n\n// [[Rcpp::depends(mypackage)]] /// specify dependency\n#include <mypackage.h>\n\n// Define and export an Rcpp function\nvoid print_linked_hello() {\n mypackage::hello_world_rcpp();\n}\n\nBe sure to add LinkingTo: mypackage in the DESCRIPTION of the second package {yourpackage}." }, { - "objectID": "posts/epichains_v0.1.0/index.html#epichains-0.1.0", - "href": "posts/epichains_v0.1.0/index.html#epichains-0.1.0", - "title": "epichains v0.1.0", - "section": "epichains 0.1.0", - "text": "epichains 0.1.0\nWe are excited to announce the first minor release of {epichains}.\n{epichains} re-implements {bpmodels}, focusing on a unified simulation framework using branching processes to simulate transmission chains data. The framework incorporates susceptible depletion and pre-existing immunity and provides dedicated data structures for handling and analysing transmission chains in both tabular and vector formats. The goal is to provide seamless interoperability with other packages within the Epiverse-TRACE Initiative and the broader epidemiological tool ecosystem." + "objectID": "posts/share-cpp/index.html#c-code-sharing-in-epiverse-trace", + "href": "posts/share-cpp/index.html#c-code-sharing-in-epiverse-trace", + "title": "Sharing the C++ Code of an Rcpp Package", + "section": "C++ code sharing in Epiverse-TRACE", + "text": "C++ code sharing in Epiverse-TRACE\nIn Epiverse-TRACE, we have structured the {finalsize} and {epidemics} packages to have manually created headers, following the principles laid out above. We follow some additional principles as well.\n\nHeader-heavy packages\n\nOur packages are header-heavy, so that most of the actual code is defined in the headers. The source files are primarily intended to contain wrappers that expose the header code to R (and our users).\n\nNamespaces to organise header code\n\nOur header code is organised into C++ namespaces, which makes it easier to understand where functions are likely to be defined, and what they might be related to. It also makes it possible to include the package headers (and namespaces) that are relevant to users, rather than including the entire codebase.\n\n\nAs an example, functions related to non-pharmaceutical interventions or vaccination regimes from the {epidemics} package can be used in other packages without also including the compartmental epidemic models contained therein." }, { - "objectID": "posts/epichains_v0.1.0/index.html#new-features", - "href": "posts/epichains_v0.1.0/index.html#new-features", - "title": "epichains v0.1.0", - "section": "New Features", - "text": "New Features\n\nDocumentation\n\nDedicated Website: Explore all features and documentation on the epichains website.\nHelp: Each function comes with extensive documentation. We welcome your feedback and suggestions for improvements.\nVignettes: This release comes with five detailed vignettes:\n\nGetting Started: A quick guide to the key functions.\nModelling Disease Control Interventions: Learn how to model various intervention strategies.\nProjecting Infectious Disease Incidence: A case study on projecting COVID-19 incidence.\nLiterature: A curation of literature on branching process applications in epidemiology.\nTheoretical Background: A deep dive into the theoretical background of the functions in the package (Contributor documentation).\nDesign principles: The design principles of {epichains} (Contributor documentation).\n\n\n\n\nSimulation\n\nsimulate_chains(): Simulate independent transmission chains from a specified number of initial cases, incorporating susceptible depletion and pre-existing immunity.\nsimulate_chain_stats(): Generate a vector of chain sizes or lengths from a specified number of initial cases, incorporating susceptible depletion and pre-existing immunity.\n\n\n\nInference\n\nlikelihood(): Estimate the (log)likelihood of transmission chain sizes or lengths, with support for numeric vectors or <epichains> and <epichains_summary> objects.\n\n\n\nTransmission Chain Data Manipulation\n\nsummary(): Extract vectors of chain sizes or lengths from <epichains> objects.\naggregate(): Generate case time series by aggregating by generation or time of infection.\nplot(): Visualize individual transmission chains filtered by their id." + "objectID": "posts/share-cpp/index.html#ensuring-the-quality-of-header-code", + "href": "posts/share-cpp/index.html#ensuring-the-quality-of-header-code", + "title": "Sharing the C++ Code of an Rcpp Package", + "section": "Ensuring the quality of header code", + "text": "Ensuring the quality of header code\nYou can lint and statically check code in a package header using tools for linting C++ code such as Cpplint and Cppcheck. When doing so, it may be important to specify minimum C++ standards, or even the language (C or C++) to avoid linter errors. This is because tools — such as Cppcheck — assume that headers with the extension .h are C headers, which throws errors when encountering C++ features such as the use of namespaces.\nCppcheck’s language and C++ standard can be set using:\ncppcheck --std=c++14 --language=c++ --enable=warning,style --error-exitcode=1 inst/include/*.h\nFurthermore, header code can also be tested independently of the R(cpp) code that eventually wraps it. This can be done using the Catch2 testing framework, which is conveniently available using {testthat} — this is an extensive topic for another post." }, { - "objectID": "posts/chjh-intro-blog/index.html", - "href": "posts/chjh-intro-blog/index.html", - "title": "New team member: Chris Hartgerink", - "section": "", - "text": "In January, I joined Epiverse as part of my new role as Senior Software Engineer at data.org. In the weeks since, I have been able to already meet a lot of members of the Epiverse community. Talking to you and hearing about what drives you, further excites me to be part of this community! In order to build on that and keep the energy flowing, I wanted to introduce myself here more broadly.\nIn this blog post, I share some of my background in R, my interest in open source, and how I look to contribute to this community. I want to already highlight that I love finishing things, from peanut butter jars to pull requests — so if you have anything that you’d like to pull over the finish line, I am more than happy to support in doing that!" + "objectID": "posts/share-cpp/index.html#conclusion", + "href": "posts/share-cpp/index.html#conclusion", + "title": "Sharing the C++ Code of an Rcpp Package", + "section": "Conclusion", + "text": "Conclusion\nDeveloping an Rcpp-based package with C++ code sharing in mind takes some organisation, or even reorganisation, of the C++ codebase. It is probably a good idea to consider whether your package will implement code that would be of interest to other developers, or to you in related projects. If either of these is true, it may help to structure your package with C++ code sharing in mind from the very beginning of development. This can substantially reduce development overheads and mistakes associated with maintaining multiple copies of the same or similar code in different projects. Fortunately, some great examples of how to do this are among the most-used Rcpp-based packages, providing both a conceptual template to consult for your work, as well as being a demonstration of how beneficial this practice can be in the long run. In Epiverse-TRACE, we intend to continue developing with C++ code sharing as a core principle so that we and other developers can build on our initial work." }, { - "objectID": "posts/chjh-intro-blog/index.html#background-in-r", - "href": "posts/chjh-intro-blog/index.html#background-in-r", - "title": "New team member: Chris Hartgerink", - "section": "Background in R", - "text": "Background in R\nMy first real introduction to programming was with R back in 2013, during a course on statistical programming. This was more about scripting than engineering, but it was where I dipped my toes. During my PhD at a methodology and statistics department, I ended up becoming the go to person for getting mundane things done faster. I ended up writing R and shell scripts left and right, with most of them ending up being used only once. I wasted a lot of time and had tons of fun doing this! I ended up writing scripts for my research as well, leading to some legal troubles with big publishers (see this old blog post). Oops!\nFrom scripting came contributing to R packages like statcheck, from which came developing my own packages like retractcheck. Before I knew it I was helping others learn R and build their own packages in small workshops. This was back when the devtools experience was pretty decent but in no way as complete as it is today.\nAround 2018, I ended up leaving academia and stopped needing to develop in R. I was not upset because R development is rather quirky and arbitrary at times. Since then, I have designed, built, and maintained production web applications in NodeJS, with my biggest achievement being ResearchEquals. I’m excited to bring the best of different languages together in this new position. Coming from NodeJS engineering, I already see some avenues for building a better developer experience in R. I look forward to not just contribute packages but also more general tooling during my time working on Epiverse." + "objectID": "posts/bpmodels_v0.3.1/index.html", + "href": "posts/bpmodels_v0.3.1/index.html", + "title": "bpmodels v0.3.1", + "section": "", + "text": "We are very excited to announce the release of a new bpmodels version v0.3.1. Here is an automatically generated summary of the changes in this version." }, { - "objectID": "posts/chjh-intro-blog/index.html#working-openly", - "href": "posts/chjh-intro-blog/index.html#working-openly", - "title": "New team member: Chris Hartgerink", - "section": "Working openly", - "text": "Working openly\nI come to open source through open science, which is all about reducing the “insiders” and “outsiders” of research. Can people access publications equitably? Can people contribute to the work that’s happening equitably? Are funds distributed equitably? There are still big barriers to accessing research, its institutions, and as a result there remain inequities in the knowledge that is produced.\nAs a natural extension of open science, I started getting involved with open source communities around 2015-2016. I started with smaller open source projects springboarding to bigger projects over time. I documented all my own research projects on GitHub and started building websites (e.g., one about copyright in research publishing). All this practice developed my philosophy of working openly, which culminated in my time as a Mozilla Open Science Fellow in 2017.\nWith over a decade of practice, I approach working openly as community work. Open source is not just making information public - it requires making that information accessible and actionable. It has to be public and participatory. It needs to be informed by the needs of those participating, and participation needs to be encouraged, enabled, and scaffolded. This also means that the community needs to define what it desires. Are we working on a static piece of information that we want to enable reuse of (e.g., a finalized dataset) or do we want to co-create with people who we still need to invite in? People come and go, and so this community culture has to evolve.\nWorking openly, to me, also means that considerate and empathetic communication is key. It requires articulating our own needs, wants, and uncertainty with honesty. It requires accepting that our personal wants are not always aligned with what the community wants. How do we deal with those situations? Do we seek consensus or majorities? These discussions need humility and not getting entitled to things. It means actually listening to each other. There is no room for peacockery or posturing, but lots of room for caring and celebrating each other." + "objectID": "posts/bpmodels_v0.3.1/index.html#input-validation", + "href": "posts/bpmodels_v0.3.1/index.html#input-validation", + "title": "bpmodels v0.3.1", + "section": "Input validation", + "text": "Input validation\n\nThe following internal functions now have input validation: rborel(), dborel(), complementary_logprob(), and rnbinom_mean_disp()." }, { - "objectID": "posts/chjh-intro-blog/index.html#contributing-to-epiverse", - "href": "posts/chjh-intro-blog/index.html#contributing-to-epiverse", - "title": "New team member: Chris Hartgerink", - "section": "Contributing to Epiverse", - "text": "Contributing to Epiverse\nI am excited to bring this philosophy of working open to Epiverse and vice versa, learn how this community is open in its own unique way. With an international community across three continents there are important questions around what it means to be participatory. After talking to various community members, it is clear this community raises important and practical questions. I am also intrigued to see what is preventing people from joining this community - what barriers are we unconsciously putting up?\nI see my role within Epiverse as a “rush goalie” (also known as a fly goalie). I will own the development and maintainance of specific pieces of software, yet I can leave those to actively support work throughout Epiverse. In short: I go where I am needed, in whatever form that may be. That can be software development, community engagement, and so much more.\nTo that end: Get in touch with me anytime you think I might be of support. Tag me on GitHub liberally (@chartgerink). Send me emails on chris@data.org. DM me wherever. I am here to support Epiverse and all its various packages in whatever shape they may need. This includes code reviews, making fixes, and being part of discussions. I would rather be tagged once too often than once too little.\nEpiverse is also not a standalone community within the R space, so when practical, I will also contribute back to the packages that we use in our work. I already started making pull requests elsewhere, and it is great to see the R community be so welcoming to outside contributions. My hope is that Epiverse will be just as welcoming and a part of that wider ecosystem.\n\nAt the moment, I am still finding my way throughout the various repositories. It takes time to understand new code and start identifying areas to contribute (and it’s not always as easy as can be). I will be checking in on the repositories (and there are a lot!) and may ask around to see whether PRs or issues are still relevant. Nonetheless, after several weeks, I am starting to feel more like a part of the community than a newbie.\nThanks for reading along and feel free to leave a comment if you have anything else you’d like to know.\n\n\n\n\n\n\nGetting in touch\n\n\n\nYou can reach me on chris@data.org or on GitHub under @chartgerink. I am also on Mastodon under @chartgerink@akademienl.social." + "objectID": "posts/bpmodels_v0.3.1/index.html#unit-tests", + "href": "posts/bpmodels_v0.3.1/index.html#unit-tests", + "title": "bpmodels v0.3.1", + "section": "Unit tests", + "text": "Unit tests\n\nCode coverage has been improved with more tests on the following functions: rborel(), dborel(), chain_sim(), rnbinom_mean_disp(), complementary_logprob(), rgen_length(), and rbinom_size()." }, { - "objectID": "posts/parent-class/index.html", - "href": "posts/parent-class/index.html", - "title": "Choosing the Right Parent for R Object Classes", + "objectID": "posts/epiparameter_v0.3.0/index.html", + "href": "posts/epiparameter_v0.3.0/index.html", + "title": "epiparameter v0.3.0", "section": "", - "text": "I have recently published a series of blog posts on the reasons why one may want to start using object-oriented programming (and more specifically R S3 classes) to improve interoperability with other tools from the ecosystem.\nBut there are still questions I have not addressed directly, even if they may have been implicitly included sometimes: what makes a good object class? What good practices in class & function design can improve interoperability?\nAs you can expect from these questions, this post will present a subjective view on S3 class and method design. I will argue that it is often a good strategy to inherit from existing standards classes, and to leverage this inheritance relationship as much as possible." + "text": "We are very excited to announce the release of a new epiparameter version v0.3.0. Here is an automatically generated summary of the changes in this version.\nThe third minor release of the {epiparameter} R package contains a range of updates and improvements to the package. The principal aim of this release is to simplify, clarify and enhance the classes and class methods when working with epidemiological parameters in R.\nThe are a large number of breaking changes in this release, primarily functions and function arguments being renamed or restructured, see the Breaking changes section for an overview." }, { - "objectID": "posts/parent-class/index.html#inherit-from-standard-classes", - "href": "posts/parent-class/index.html#inherit-from-standard-classes", - "title": "Choosing the Right Parent for R Object Classes", - "section": "Inherit from standard classes", - "text": "Inherit from standard classes\nA unique feature of R is the availability and centrality of data.frames in the base language, whereas you need extra libraries for a similar functionality in most other languages (e.g., pandas in Python).\ndata.frame is one of the first “complex” (in the sense of non-atomic) object most R learners will be exposed to and will develop a familiarity with. A good way to leverage this familiarity is to make your subclass a thin wrapper around data.frames.\nThis means that not only will users be able to get started with your package faster because of this familiarity, but you will also immediately benefit from the huge ecosystem of functions and packages working on data.frames, such as the tidyverse. If you want some examples, this is what collaborators and I did in the linelist, pavo, scoringutils, epichains, and vaccineff R packages.\nIn some cases, the output is too complex to fit into a data.frame. Even in this situation, I would recommend inheriting from existing, well-established, classes for the same two reasons: familiarity and ecosystem. For example, for the serofoi R package, we have made the decision to inherit from stanfit objects, rather than a custom structure." + "objectID": "posts/epiparameter_v0.3.0/index.html#new-features", + "href": "posts/epiparameter_v0.3.0/index.html#new-features", + "title": "epiparameter v0.3.0", + "section": "New features", + "text": "New features\n\nThe library of epidemiological parameters has been updated to include 3 new Chikungunya parameter entries. The Mpox parameters previously missing from the Guzzetta et al. entry have been added (#346 & #374).\nA c() method has been added for <epiparameter> and <multi_epiparameter> objects (#368).\nAn aggregate() method has been added for <multi_epiparameter> to enable consensus distributions to be built utilising the mixture distribution class from {distributional} (#388).\nInfrastructure has been added to the package to allow translations of messages/warnings/errors printed to the console. (@Bisaloo, #367).\nconvert_summary_stats_to_params() can now convert from median and dispersion for a lognormal distribution (#378).\nThe data_dictionary.json has been enhanced to improve validation of the library of epidemiological parameters (parameters.json) (#379).\nAn interactive network showing all of the <epiparameter> S3 methods has been added to the design_principles.Rmd vignette (#383).\nThe data_from_epireview.Rmd article has been improved and updated with new changes from {epireview} (@CarmenTamayo & @cm401 & @kellymccain28, #305 & #373).\nParameter units have been added to every entry in the {epiparameter} library (parameters.json) and to the $metadata element in <epiparameter> objects. The create_metadata() function now has a units argument to construct metadata lists (#391).\nImproved database.Rmd vignette by adding a short citation to the reference column (@jamesmbaazam, #348).\nThe family() method for <epiparameter> has been improved to allow access to distribution names for transformed (e.g. mixture and truncated distributions) and untransformed (e.g. gamma and lognormal) distributions with a new argument base_dist and a new internal function .distributional_family() (#398).\nas_epiparameter() can now work with the SARS parameters from {epireview} (#407)." }, { - "objectID": "posts/parent-class/index.html#rely-on-parent-methods-as-much-as-possible", - "href": "posts/parent-class/index.html#rely-on-parent-methods-as-much-as-possible", - "title": "Choosing the Right Parent for R Object Classes", - "section": "Rely on parent methods as much as possible", - "text": "Rely on parent methods as much as possible\nA follow up recommendation from inheriting from standard classes is to leverage their methods wherever possible.\nOne of the first changes I made when becoming maintainer of the linelist package was to remove the rename.linelist() and select.linelist() methods. Indeed, they were, or could easily be, behaving identically as the parent rename.data.frame() and select.data.frame() methods. Rather than burdening the codebase and maintenance with an extra unnecessary method, it is much simpler and more robust to rely on the well-tested parent method. In fact, the dplyr documentation explicitly recommends only writing methods for a couple of standard functions (including [.subclass() and names<-.subclass()), which will enable the use of parent methods directly, rather than writing custom methods for each dplyr function.\nSimilarly, many developers have the reflex to write a custom print.subclass() method as part of the method implementation. While it may be justified in some cases, it is sometimes unnecessary. My recommendation would be to evaluate carefully what benefits the custom method brings over the default parent method." + "objectID": "posts/epiparameter_v0.3.0/index.html#breaking-changes", + "href": "posts/epiparameter_v0.3.0/index.html#breaking-changes", + "title": "epiparameter v0.3.0", + "section": "Breaking changes", + "text": "Breaking changes\n\nThe <epidist> class has been renamed <epiparameter> to avoid confusion with the similar R package {epidist} (#360).\nMany functions that used epidist in their names have been renamed to use epiparameter due to the renaming of the class (#360).\nThe function signatures of epiparameter() and new_epiparameter() functions (previously epidist() and new_epidist()) have been updated to collapse the prob_dist, prob_dist_params, discretise and truncation arguments into prob_distribution, which accepts the output of create_prob_distribution() (#381).\nThe epi_dist argument has been renamed epi_name. This is to clarify that {epiparameter} can work with epidemiological parameters that take a variety of forms (e.g. point estimates, ranges, probability distributions, etc.) (#390).\nThe <vb_epidist> class and it’s methods have been removed from the package. It was not being used and was increasing the complexity and maintenance load of the package (#359).\ncreate_prob_dist() has been renamed to create_prob_distribution() (#381).\nvalidate_epiparameter() (previously validate_epidist()) has been renamed assert_epiparameter(), and test_epiparameter() has been added, with the aim to harmonise design with {contactmatrix} and the messages and errors have been improved (#366 & #402).\nThe minimum version of R required by the package is now 4.1.0 due to the use of the base R pipe (|>) in dependencies, and the R-CMD-check workflow on GitHub actions now explicitly runs on the minimum version of R stated in the DESCRIPTION (#384 & #405)." }, { - "objectID": "posts/parent-class/index.html#enable-conversion-to-standard-classes", - "href": "posts/parent-class/index.html#enable-conversion-to-standard-classes", - "title": "Choosing the Right Parent for R Object Classes", - "section": "Enable conversion to standard classes", - "text": "Enable conversion to standard classes\nIf after careful consideration, extra metadata makes it too difficult to fit your new class into an existing class, you may sometimes have to define your own class from “scratch” (i.e., often list() in R).\nBut even in this case, you can still apply some of the ideas proposed earlier. As much as possible, you should provide helpers or methods to enable the streamlined conversion of your method to a standard class.\nA good example here is the epiparameter package, which provides a complex S3 class built on lists, including extensive metadata about probability distribution of epidemiological parameters. As such, this custom class cannot be used out of the box in most functions from other packages. But an as.function() method is conveniently provided to enable the conversion of this probability distribution parameters into a density distribution, which can then be used in functions which expect a function object." + "objectID": "posts/epiparameter_v0.3.0/index.html#bug-fixes", + "href": "posts/epiparameter_v0.3.0/index.html#bug-fixes", + "title": "epiparameter v0.3.0", + "section": "Bug fixes", + "text": "Bug fixes\n\nEpidemiological parameter entries in the library stored with lognormal distributions, parameterised as median and dispersion are now converted to meanlog and sdlog correctly when creating an <epiparameter> (when auto_calc_params = TRUE) (#381)." }, { - "objectID": "posts/parent-class/index.html#conclusion", - "href": "posts/parent-class/index.html#conclusion", - "title": "Choosing the Right Parent for R Object Classes", - "section": "Conclusion", - "text": "Conclusion\nIn summary, I recommend relying on well-established parent classes such as data.frames or at least providing direct conversion functions to these standard classes, and using parent methods wherever possible rather than writing custom dedicated methods. This should help produce a package:\n\nmore easily accessible for users because it uses objects that feel familiar\nmore maintainable because a lot of method writing is offloaded to the parent class\nmore likely to be interoperable because standard classes are a good way to pass data between functions or packages\n\nThanks to Chris Hartgerink, James Azam and Josh Lambert, for their very valuable feedback on this post." + "objectID": "posts/epiparameter_v0.3.0/index.html#deprecated-and-defunct", + "href": "posts/epiparameter_v0.3.0/index.html#deprecated-and-defunct", + "title": "epiparameter v0.3.0", + "section": "Deprecated and defunct", + "text": "Deprecated and defunct\n\nepidist_db() has been deprecated. It is replaced by epiparameter_db() (#360 & #399)." }, { - "objectID": "posts/cfr_v0.1.0/index.html", - "href": "posts/cfr_v0.1.0/index.html", - "title": "cfr v0.1.0", - "section": "", - "text": "We are very excited to announce the release of a new cfr version v0.1.0. Here is an automatically generated summary of the changes in this version.\nInitial CRAN submission of cfr, an R package to estimate the severity of a disease and ascertainment of cases while correcting for delays in outcomes of reported cases being known.\nThis release includes:\n\nFunctions for the overall severity of an outbreak, the overall severity of an outbreak estimated with an expanding time series of data, and the time-varying severity of an outbreak,\nA function to estimate the number of outcomes to be expected from a given number of cases assuming a user-specified distribution of delays between cases and outcomes being known,\nA function to estimate the overall (static) ascertainment of cases in an outbreak by comparing the relevant severity measures against a user-specified baseline severity (note that functionality for time-varying ascertainment is expected to be included in future versions, and an older implementation of this functionality was removed just prior to release),\nA data preparation generic with an S3 method for the <incidence2> class from the incidence2 package,\nExample daily case and death data from the 1976 Ebola Virus Disease outbreak as reported in Camacho et al. (2014). https://doi.org/10.1016/j.epidem.2014.09.003,\nExample daily case and death data from the Covid-19 pandemic over the range 2020-01-01 to 2022-12-31 from the 19 countries with over 100,00 deaths over this period, as taken from the covidregionaldata package which is no longer on CRAN,\nVignettes describing how to get started with severity estimation, and more detailed workflows on different kinds of severity estimation,\nA vignette on working with data from the incidence2 package, and a vignette on working with delay distributions,\n100% code coverage,\nWorkflows to render the vignettes and README as a website.\n\n\n\n\nReuseCC BY 4.0CitationBibTeX citation:@online{epiverse-trace_development_team2023,\n author = {Epiverse-TRACE development team, The},\n title = {Cfr V0.1.0},\n date = {2023-11-22},\n url = {https://epiverse-trace.github.io/posts/cfr_v0.1.0/},\n langid = {en}\n}\nFor attribution, please cite this work as:\nEpiverse-TRACE development team, The. 2023. “Cfr V0.1.0.”\nNovember 22, 2023. https://epiverse-trace.github.io/posts/cfr_v0.1.0/." + "objectID": "posts/epiparameter_v0.3.0/index.html#acknowledgements", + "href": "posts/epiparameter_v0.3.0/index.html#acknowledgements", + "title": "epiparameter v0.3.0", + "section": "Acknowledgements", + "text": "Acknowledgements\n@Bisaloo @CarmenTamayo @adamkucharski @avallecam @chartgerink @jamesmbaazam @joshwlambert" }, { - "objectID": "posts/simulist_v0.3.0/index.html", - "href": "posts/simulist_v0.3.0/index.html", - "title": "simulist v0.3.0", + "objectID": "posts/simulist_v0.1.0/index.html", + "href": "posts/simulist_v0.1.0/index.html", + "title": "simulist v0.1.0", "section": "", - "text": "We are very excited to announce the release of a new simulist version v0.3.0. Here is an automatically generated summary of the changes in this version.\nThe third release of the simulist R package contains a range of new features, enhancements, documentation and unit tests.\nThe headline changes to the package are:" + "text": "We are very excited to announce the release of a new simulist version v0.1.0. Here is an automatically generated summary of the changes in this version.\nInitial release of simulist, an R package containing tools to simulate epidemiological data such as line lists and contact tables." }, { - "objectID": "posts/simulist_v0.3.0/index.html#new-features", - "href": "posts/simulist_v0.3.0/index.html#new-features", - "title": "simulist v0.3.0", + "objectID": "posts/simulist_v0.1.0/index.html#new-features", + "href": "posts/simulist_v0.1.0/index.html#new-features", + "title": "simulist v0.1.0", "section": "New features", - "text": "New features\n\nExternal\n\nonset_to_hosp and onset_to_death arguments can now take NA as input and will return a column of NAs in the line list columns date_admission and date_outcome (#98).\nAn onset_to_recovery argument has been added to the simulation functions, sim_linelist() and sim_outbreak(), and so the recovery date can be explicitly given in the line list data (#99).\nThe line list simulation can now use a time-varying case fatality risk. The create_config() function now returns a $time_varying_death_risk list element, which is NULL by default but can take an R function to enable the fatality risk of cases to change over the epidemic (#101).\nA new vignette, time-varying-cfr.Rmd, has been added to the package to describe how to use the time-varying case fatality risk functionality and describe a few different time-varying functions that can be used (#101).\nA new vignette, wrangling-linelist.Rmd, has been added to go over some of the common post-processing steps that might be required after simulating line list or contact tracing data. This vignette is short and currently only contains a single post-processing example, more examples will be added over time (#104).\nThe README now has a section on related projects to provide an overview of packages that simulate line list data, or are related to simulist. This section contains a disclosure widget containing a feature table providing a high-level description of the features and development status of each related package (#110).\nA Key features section and Complimentary R packages section has been added to the README (#134).\nUpdated package architecture diagram in the design-principles.Rmd vignette (#113).\n\n\n\nInternal\n\nThe .add_deaths() function has been replaced by the .add_outcome() function which can simulate death and recovery times (#99).\n.cross_check_sim_input() function has been added to the package to ensure user input is coherent across arguments (#98).\n.anonymise() function has been added to convert individual’s names into alphanumeric codes to anonymise individuals in line list and contact tracing data (#106)." + "text": "New features\n\nA set of simulation functions to simulate epidemiological data. These simulation functions contain arguments that allow for the simulation to be flexibly parameterised to be generated under specified epidemiological or demographic conditions.\n\nsim_linelist(): simulate line list data\nsim_contacts(): simulate contacts data\nsim_outbreak(): simulate both line list and contacts data\n\nFour general vignettes\n\nGet Started vignette (simulist.Rmd)\nAge-stratified hospitalisation and death risks vignette (age-strat-risks.Rmd)\nAge structured population vignette (age-struct-pop.Rmd)\nVisualising simulated data (vis-linelist.Rmd)\n\nOne developer focused vignette\n\nDesign Principles for {simulist} (design-principles.Rmd)\n\nUnit tests (100% code coverage) and documentation files.\nContinuous integration workflows for R package checks, rendering the README.md, calculating test coverage, deploying the pkgdown website, updating the citation file, and checking new package or system dependencies." }, { - "objectID": "posts/simulist_v0.3.0/index.html#breaking-changes", - "href": "posts/simulist_v0.3.0/index.html#breaking-changes", - "title": "simulist v0.3.0", - "section": "Breaking changes", - "text": "Breaking changes\n\nThe simulation functions are now parameterised with an infectious period (infect_period argument) instead of a contact interval (contact_interval argument). This moves away from parameterising the simulation with the time delay between a person becoming infected and having contact with a susceptible individual, and instead uses an infectious period distribution within which contacts are uniformly distributed in time (#96).\nThe simulation functions can now set a maximum as well as a minimum outbreak size. The min_outbreak_size argument in the exported sim_*() functions has been renamed outbreak_size and takes a numeric vector of two elements, the minimum and maximum outbreak size. The maximum outbreak size is a soft limit due to the stochastic nature of the branching process model, so epidemiological data returned can contain more cases and/or contacts that the maximum in outbreak_size but in these case a warning is returned explaining to the user how many cases/contacts are being returned (#93).\nThe add_ct argument in sim_linelist() and sim_outbreak() has been removed. The functionality is now equivalent to add_ct = TRUE in the previous simulist version. The add_ct argument was removed to move the package to always returning <data.frame>s with the same number of columns, for consistency and predictability (#104).\nThe add_names argument in the simulation functions has been renamed to anonymise. The new argument controls whether names are given to each case (anonymise = FALSE, default behaviour) or whether fixed length hexadecimal codes are given to each case (anonymise = TRUE), this ensures the returned <data.frame> has the same number of columns (#106)." + "objectID": "posts/comm-software-devel/index.html", + "href": "posts/comm-software-devel/index.html", + "title": "Communicating development stages of open-source software", + "section": "", + "text": "Software is not immediately stable when being developed. It undergoes design changes, changes to the user interface (application programming interface, API), and features get added or removed over time. Software in a open-source setting, in which the code is publicly hosted in a repository (e.g., Github, GitLab, Bitbucket), allows anyone to track developments. It also allows the developer community to easily contribute to the software.\nThere are certain metrics which can be used to convey the stage of development to users or other developers. For example the number of commits, a repository with few commits may indicate that a project is still in an incipient phase and will undergo several breaking changes. However, different software projects become stable at different rates and the number of commits may mean very different things for a repository containing an operating system compared to an R package with minimal functionality. It is therefore important that developers communicate with users and other developers at what stage the project is in, and how stable the code base is.\nSoftware development, and specifically R package development, has several methods to communicate stability. This blog post will discuss two such methods and give examples for each. The first of these is versioning code, which establishes points in development where the code is ready for use; and the second is lifecycle badges, these can be placed at a different levels within software (e.g., package, function, function argument) to convey how a user should interact and use." }, { - "objectID": "posts/simulist_v0.3.0/index.html#bug-fixes", - "href": "posts/simulist_v0.3.0/index.html#bug-fixes", - "title": "simulist v0.3.0", - "section": "Bug fixes", - "text": "Bug fixes\n\n.sim_network_bp() now indexes the time vector correctly. Previously a vector indexing bug meant the epidemic would not progress through time (#95)." + "objectID": "posts/comm-software-devel/index.html#versioning", + "href": "posts/comm-software-devel/index.html#versioning", + "title": "Communicating development stages of open-source software", + "section": "Versioning", + "text": "Versioning\nVersioning code is not a new concept and has been used for decades1. It has led to version control systems such as git. However, in this post we are interested in versioning to communicate development.\n\nSemantic versioning\nOne such philosophy is semantic versioning (SemVer). This aims to describe the stage of software development by attaching semantics (i.e. meaning) to the format and numbering of versions. The version system works through three numbers, each separated by a dot. The numbers, from left to right, convey major version, minor version and patch version. As an example, 0.5.2, is newer than 0.3.9.\nEmploying semantic versioning in ones code development allows others to determine whether a package has undergone substantial development and testing, and informs to whether it would make a suitable package to use in a script or as a dependency for another package. Semantic versioning also describes the changes made to a package. As explained on their website, incrementing the major version implies a breaking change, a minor increment is a backwards compatible change and lastly patches are mostly applied to bug fixes. This aids users in understanding whether they should continue using a package, whether their package needs updating due to a breaking change or whether they need to install the newest version because a bug was recently fixed.\nExamples of changes that correspond to major, minor or patch updates can be seen in the version release notes (NEWS.md file) of {dplyr} and {ggplot2}.\nIn R there are several packages that work with versioning, and specifically semantic versioning. The {semver} package provides functions for parsing, rendering and comparing versions. There is also the {semverutils} R package which provides similar functionality using R6. The {usethis} package provides handy utility functions for changing the versions of R packages (usethis::use_version() and usethis::use_dev_version()). R also comes with a package_version() function for creating and validating versions.\nOverall semantic versioning provides what they describe as a “formal specification” to facilitate management of package development and the dependencies of that package. It is the most widely-used versioning system and therefore will be understood by a wide variety of users and developers.\nSome of the critique raised for semantic versioning is the difficulty of defining how changes correspond to a version increment. Semantic versioning states only breaking changes warrant major releases, but a vast re-write of a code base may also justify a major version change. Different breaking changes have different magnitudes, therefore a change to a single exported function or a change to every exported function will be communicated in a single, equal, version increment.\n\n\nAlternatives to semantic versioning\nThere are several other versioning frameworks aside from semantic versioning. One common option is calendar versioning (CalVer). The format of CalVer is usually year-month (YY-MM), or year-month-day (YY-MM-DD), depending on the regularity of releases, and allows appending tags (micros or modifiers, e.g. YY-MM.1).\nOther versioning schemes can appear similar to semantic versioning, but do not follow the guidelines around version semantics. In these cases, a bump in the major version may not relate to a breaking change. Additionally, other numbers can be attached to the traditional x.y.z format, such as build numbers. Build number versioning adds an extra number to specify the build (x.y.z.build_number). There are many other variants but covering all versioning systems is outside the scope of this post.\n\n\nVersioning an R package\nThere are some restrictions on valid version numbers for R packages. The official “Writing R Extensions” guide state:\n\nThis is a sequence of at least two (and usually three) non-negative integers separated by single ‘.’ or ‘-’ characters.\n\n\n\nWhy version?\nThe benefits of versioning apply beyond communicating with users and developers. Implementing versioning eases reproducibility by allowing systems to record which version of a language or package was used. In R this can be achieved in several ways, with some popular examples being the {renv} package and docker." }, { - "objectID": "posts/simulist_v0.3.0/index.html#deprecated-and-defunct", - "href": "posts/simulist_v0.3.0/index.html#deprecated-and-defunct", - "title": "simulist v0.3.0", - "section": "Deprecated and defunct", - "text": "Deprecated and defunct\n\nNone" + "objectID": "posts/comm-software-devel/index.html#lifecycle-badges", + "href": "posts/comm-software-devel/index.html#lifecycle-badges", + "title": "Communicating development stages of open-source software", + "section": "Lifecycle badges", + "text": "Lifecycle badges\nBadges can be pasted onto visible parts of the code, for example a readme document in the root of the repository, to show the development phase and stability. The three badging systems we will discuss in this post are:\n\nRepoStatus\nTidyverse lifecycles\nReconverse lifecyles\n\n\nRepoStatus\nRepoStatus is a language agnostic set of badges which describe the stages of code development and the possible transitions between those stages.\nAs shown in the figure below, there are multiple stages to communicate both unstable and stable software. There are also multiple paths between each stage, recognising the varied routes software development can take.\n\n\n\nRepoStatus badge system. Reused under CC BY-SA 4.0 from repostatus.org\n\n\n\n\nTidyverse\nThe tidyverse approach is broadly similar to RepoStatus. The {lifecycle} R package contains the description of their process. There are four stages:\n\nExperimental\nStable\nSuperseded (previously called retired)\nDeprecated\n\nMost code will go through the experimental phase, as it will likely change its API and the number and order of arguments might change. Once code is not going to drastically change (i.e. no breaking changes), at least from a users point of view, it can be labelled stable. In the tidyverse lifecycle schematic, all experimental code transitions to stable code.\nThe two stages that follow stable are: superseded and deprecated. The former describes a situation in which a new package, a new function or a new argument, depending on the context, has been developed which the developer feels should be used instead of the now superseded code. Superseded code is still developed in the sense that changes to the language or package that may break the function will be fixed as well as bug fixes, but the function will not received ongoing development. The latter, deprecation, is used in cases when the developer thinks that a package or function should not longer be used. This is primarily employed when code is depended on by other software and therefore deleting the code would cause breaks in reverse dependencies. Thus the deprecation warning allows developers of those dependencies time to make the relevant changes.\n\n\n\n{lifecycle} badge system. Reused under MIT license from lifecycles R package\n\n\nOne of the main differences between the tidyverse lifecycles, compared to the others discussed in this posts is their applicability at different levels in the code. The lifecycle badges can be applied at the package-level (e.g., stringr), the function-level (e.g. dplyr::group_trim()) or the argument level (e.g., dplyr::across()).\nUsing {lifecycle} in a package can be setup using usethis::use_lifecycle(). The {lifecycle} package not only provides badges, but also informative deprecation notices which communicate to users that a function is not longer supported since a version release of a package. This offers the user a chance to find an alternative function for future use.\nThe use of deprecation warnings from {lifecycle} leads onto another aspect of tidyverse development: protracted deprecation. There is no fixed rules on how long after a deprecation warning is made to when code should be removed. In the tidyverse, this process is given ample time in order to allow the many developers that utilise tidyverse software to make the necessary changes. Full descriptions of the {lifecycle} package can be found on the website, including the deprecated use of questioning and maturing stages.\n\n\nReconverse\nReconverse provides four stages of software development:\n\nconcept\nexperimental\nmaturing\nstable\n\nA difference between {lifecycle} and reconverse is the explicit connection between semantic versioning and development stage in reconverse. The transitions between experimental, maturing and stable are linked to the versioning less than 0.1.0, less than 1.0.0 and greater than 1.0.0, respectively.\n\n\nDynamic badges\nAll badge frameworks discussed only offer static badges that require developers to manually update as the project moves between phases. This is subject to the maintainers remembering, which can lead to miscommunication about a package’s stage, which may have move on from being experimental, or not been worked on in years but has an active badge.\nDynamics badges, like those offered by https://shields.io/ give a good indication of how recently the project was changed by showing time since last commit, or the number of commits since last release. These too are not perfect but may better track changes and take the burden of badge updates off the project maintainer." }, { - "objectID": "posts/simulist_v0.3.0/index.html#acknowledgements", - "href": "posts/simulist_v0.3.0/index.html#acknowledgements", - "title": "simulist v0.3.0", - "section": "Acknowledgements", - "text": "Acknowledgements\nMany thanks to the contributors to this release, either from issues, code contributions, reviews or discussions (listed in alphabetical order):\n@adamkucharski, @avallecam, @Bisaloo, @CarmenTamayo, @chartgerink, @jamesmbaazam, @ntncmch, @PaulC91, @pratikunterwegs, and @sbfnk." + "objectID": "posts/comm-software-devel/index.html#communicating-development-in-the-epiverse-trace", + "href": "posts/comm-software-devel/index.html#communicating-development-in-the-epiverse-trace", + "title": "Communicating development stages of open-source software", + "section": "Communicating development in the Epiverse-TRACE", + "text": "Communicating development in the Epiverse-TRACE\nWithin the Epiverse-TRACE initiative we use semantic versioning and badges to convey to the community interacting with our code at which stage of developement each project is in. We do not have fixed rules on which badges to use and a variety of badges can be found across the repositories in the organisation. For example reconverse badges are used for {linelist}, RepoStatus badge is used in {finalsize}, and tidyverse badges are used in {epiparameter}.\nWe take this approach as no lifecycle badging system is perfect, each with benefits and downsides. The badges from {lifecycle} are the most common and thus recognisable in R package development, however may not port well to other languages or be familiar to developers coming to R from other frameworks. RepoStatus has the benefit of not being designed for a single language, and it’s number of badges gives greater acuity to the stage of development for a project. This may be especially useful if a package is newly developed and {lifecycle} would describe it as experimental, but RepoStatus provides granularity as to whether it is a concept package, work in progress (WIP) or started but abandoned.\nThere is some ambiguity in the semantics of the active stage in RepoStatus, which in the definition is “stable, usable state”, but may be misinterpreted as being unstable but actively developed.\nLastly reconverse provides a system akin to {lifecycle} and may be useful for those working in the epidemiology developer space. However, one downside of the reconverse system is there are no clear semantics for a package being deprecated or archived. As with almost all code, at some point development ceases and this stage should be communicated, even if just to say that the package is not being updated inline with developments in the underlying language, in this case R.\nThere are no plans within Epiverse-TRACE to develop a new badging system as the existing systems cover almost all use cases. In the event that the current development stage cannot be adequately communicated with a single badge from one of the frameworks discussed, a combination of badges can be used. For example, early on in a project adding both the experimental badge from {lifecycle} or reconverse and the WIP badge from RepoStatus may more accurately describe the projects develop pace. Alternatively, the stable badge, from either {lifecycle} or reconverse, can be coupled with either active or inactive from RepoStatus to let other developers know if software will be updated with new language features or dependency deprecations.\nOverall, the use of any of the three lifecycle frameworks described here is better than none." }, { - "objectID": "posts/ai-learner-review/index.html", - "href": "posts/ai-learner-review/index.html", - "title": "Using LLM agents to review tutorials ‘in character’ as learners", - "section": "", - "text": "Part of the Epiverse-TRACE initiative involves development of training materials that span early, middle and late stage outbreak analysis and modelling tasks. To ensure that our tutorials are accessible to target audiences, we have developed a series of learner personas to inform the design of learning materials. These personas include the following:\n\nLucia, a Field Epidemiologist that use R for data cleaning, plotting and report for Outbreak response.\nJuan, a Statistician and R user in a National Health Agency with constant deployment to outbreak response.\nPatricia, a PhD student learning to use R and analyse Outbreak data for her collaborative project on GitHub.\nVania, a professor who needs ready-to-use training for her research and to pass on to students.\nDanielle, a Trainer that wants to remix content to create specific training materials for public health practitioners.\n\nAs the volume of training materials increases, we have explored automating the generation of initial reviews using large language models (LLMs) that take the form of ‘in character’ agents with instructions to provide constructive comments. This reflects a wider focus within the field of outbreak analytics on how LLMs agents can be used to increase the efficiency and scalability of common tasks (e.g. van Hoek et al, Lancet Microbe, 2024 ).\nTo generate the AI tutorial reviews, we use the OpenAI GPT-4 API, via the openai R package, as described in this repository. We also use the gh package to load the .Rmd materials from a given repository (e.g. epiverse-trace/tutorials-middle). Full illustrative code is available here, with the GPT-4 API prompts outlined below.\n# Define first prompt\nuser_prompt_1 <- \"You are the following person, and give all your answers in character:\"\n\n# Load Lucia persona\npersona_bio <- read_file(\"https://raw.githubusercontent.com/epiverse-trace/personas/master/lucia-outbreaks.qmd\")\n\n# Define second prompt\nuser_prompt_2 <- \"Now suppose you want to complete the following tutorial about outbreak analysis in R. The content is in R markdown but would be knitted to HTML in reality, with additional figures where relevant. Provide a critique of the tutorial from your perspective as a learner. What is unclear? What is useful? What is difficult? What could be refined? Support comments with brief quotes. In your feedback be succinct, positive, constructive and specific. State what content to keep and what to improve. Provide clear suggestions for next steps to remove, change or add content. Note that visualisations will be in the tutorial, but are not shown in the Rmd, so do not comment on these. Summarise your review and suggestions for specific improvements in short bullet point paragraphs. If some of the points are similar, amalgamate these into a single bullet point. \\n\\n\"\n\n# Call OpenAI API (credentials must already be defined - see full illustrative code link above)\noutput <- create_chat_completion(\n model = \"gpt-4\",\n messages = list(list(\"role\"=\"user\",\"content\" = paste0(user_prompt_1,persona_bio,\n user_prompt_2,tutorial_contents))\n )\n\n# Write review to .md using readr\nwrite_lines(output,\"output_review.md\"))" + "objectID": "posts/comm-software-devel/index.html#footnotes", + "href": "posts/comm-software-devel/index.html#footnotes", + "title": "Communicating development stages of open-source software", + "section": "Footnotes", + "text": "Footnotes\n\n\nhttps://en.wikipedia.org/wiki/Version_control↩︎" }, { - "objectID": "posts/ai-learner-review/index.html#turning-learner-personas-into-llm-agents", - "href": "posts/ai-learner-review/index.html#turning-learner-personas-into-llm-agents", - "title": "Using LLM agents to review tutorials ‘in character’ as learners", + "objectID": "posts/linelist_v1.0.0/index.html", + "href": "posts/linelist_v1.0.0/index.html", + "title": "linelist v1.0.0", "section": "", - "text": "Part of the Epiverse-TRACE initiative involves development of training materials that span early, middle and late stage outbreak analysis and modelling tasks. To ensure that our tutorials are accessible to target audiences, we have developed a series of learner personas to inform the design of learning materials. These personas include the following:\n\nLucia, a Field Epidemiologist that use R for data cleaning, plotting and report for Outbreak response.\nJuan, a Statistician and R user in a National Health Agency with constant deployment to outbreak response.\nPatricia, a PhD student learning to use R and analyse Outbreak data for her collaborative project on GitHub.\nVania, a professor who needs ready-to-use training for her research and to pass on to students.\nDanielle, a Trainer that wants to remix content to create specific training materials for public health practitioners.\n\nAs the volume of training materials increases, we have explored automating the generation of initial reviews using large language models (LLMs) that take the form of ‘in character’ agents with instructions to provide constructive comments. This reflects a wider focus within the field of outbreak analytics on how LLMs agents can be used to increase the efficiency and scalability of common tasks (e.g. van Hoek et al, Lancet Microbe, 2024 ).\nTo generate the AI tutorial reviews, we use the OpenAI GPT-4 API, via the openai R package, as described in this repository. We also use the gh package to load the .Rmd materials from a given repository (e.g. epiverse-trace/tutorials-middle). Full illustrative code is available here, with the GPT-4 API prompts outlined below.\n# Define first prompt\nuser_prompt_1 <- \"You are the following person, and give all your answers in character:\"\n\n# Load Lucia persona\npersona_bio <- read_file(\"https://raw.githubusercontent.com/epiverse-trace/personas/master/lucia-outbreaks.qmd\")\n\n# Define second prompt\nuser_prompt_2 <- \"Now suppose you want to complete the following tutorial about outbreak analysis in R. The content is in R markdown but would be knitted to HTML in reality, with additional figures where relevant. Provide a critique of the tutorial from your perspective as a learner. What is unclear? What is useful? What is difficult? What could be refined? Support comments with brief quotes. In your feedback be succinct, positive, constructive and specific. State what content to keep and what to improve. Provide clear suggestions for next steps to remove, change or add content. Note that visualisations will be in the tutorial, but are not shown in the Rmd, so do not comment on these. Summarise your review and suggestions for specific improvements in short bullet point paragraphs. If some of the points are similar, amalgamate these into a single bullet point. \\n\\n\"\n\n# Call OpenAI API (credentials must already be defined - see full illustrative code link above)\noutput <- create_chat_completion(\n model = \"gpt-4\",\n messages = list(list(\"role\"=\"user\",\"content\" = paste0(user_prompt_1,persona_bio,\n user_prompt_2,tutorial_contents))\n )\n\n# Write review to .md using readr\nwrite_lines(output,\"output_review.md\"))" + "text": "We are very excited to announce the release of a new linelist version v1.0.0. Here is a automatically generated summary of the changes in this version." }, - { - "objectID": "posts/ai-learner-review/index.html#reviewing-transmissibilty-training-materials-as-a-senior-academic", - "href": "posts/ai-learner-review/index.html#reviewing-transmissibilty-training-materials-as-a-senior-academic", - "title": "Using LLM agents to review tutorials ‘in character’ as learners", - "section": "Reviewing transmissibilty training materials as a senior academic", - "text": "Reviewing transmissibilty training materials as a senior academic\nTo illustrate the sort of reviews that can be generated with the above approach, the below is the output produced when we imported the Vania senior academic persona alongside an early pre-Alpha version of the training materials for our transmissibility estimation tutorial:\n\nOverall, the tutorial is well-structured and provides a comprehensive guide on how to use the EpiNow2 package to estimate transmission metrics from case data. However, there are several areas that could be improved to enhance clarity and understanding for learners like me.\n\nPrerequisites: The prerequisites section could be expanded to include specific resources for learners to review the necessary concepts. For instance, links to resources on Bayesian analysis and the effective reproduction number would be helpful.\nJargon and Terminology: The tutorial contains several technical terms and jargon that could be confusing for learners. For example, terms like “Bayesian inference”, “posterior probability”, “likelihood”, and “prior probability” are used without clear explanations. Providing definitions or links to resources for these terms would be beneficial.\nCode Explanation: While the tutorial does a good job of explaining what the code does, it could provide more context on why certain functions or packages are used. For instance, it would be helpful to explain why the EpiNow2 package is used and how it differs from other packages.\nVisualizations: Although visualizations are not shown in the Rmd, it would be beneficial to explain what learners should expect to see in the visualizations and how to interpret them.\nExercises: The tutorial lacks exercises for learners to practice the concepts and skills they have learned. Including exercises with solutions would greatly enhance the learning experience.\n\nSuggestions for improvements:\n\nExpand the prerequisites section to include specific resources for learners to review necessary concepts.\nProvide definitions or links to resources for technical terms and jargon.\nProvide more context on why certain functions or packages are used.\nExplain what learners should expect to see in the visualizations and how to interpret them.\nInclude exercises with solutions for learners to practice the concepts and skills they have learned." + { + "objectID": "posts/linelist_v1.0.0/index.html#new-features", + "href": "posts/linelist_v1.0.0/index.html#new-features", + "title": "linelist v1.0.0", + "section": "New features", + "text": "New features\n\nIncreased compatibility with dplyr is now documented and ensured through tests of all dplyr verbs on linelist objects as part of our testing & continuous integration system, as well as a new vignette: https://epiverse-trace.github.io/linelist/articles/compat-dplyr.html (@Bisaloo, #53)\nA new selection helper is provided for tidyverse users, based on the existing selectors provided by the tidyselect package: has_tag() (@Bisaloo, #61). By feeding it a character vector of tags to operate on, you can work with dplyr verbs on specific tagged columns without having to explicitly use the column names:\nx %>%\n dplyr::select(has_tag(c(\"id\", \"date_of_onset\")))" }, { - "objectID": "posts/ai-learner-review/index.html#reviewing-forecast-materials-as-a-field-epidemiologist", - "href": "posts/ai-learner-review/index.html#reviewing-forecast-materials-as-a-field-epidemiologist", - "title": "Using LLM agents to review tutorials ‘in character’ as learners", - "section": "Reviewing forecast materials as a field epidemiologist", - "text": "Reviewing forecast materials as a field epidemiologist\nTo further illustrate review outputs, the below is what was produced when we imported the Lucia field epidemiologist persona alongside an early pre-Alpha version of the training materials for our forecasting tutorial:\n\nPositive Feedback:\n\nThe tutorial is well-structured and follows a logical progression, which makes it easier to follow.\nThe use of real-world examples and datasets (COVID-19 and Ebola) makes the tutorial more relatable and practical.\nThe inclusion of questions, objectives, prerequisites, and key points at the beginning and end of the tutorial respectively, helps to guide the learning process.\nThe tutorial provides a good balance between theory and practice, with clear explanations and practical coding examples.\nThe use of callouts to highlight important information or provide additional context is very helpful.\n\nConstructive Feedback:\n\nThe tutorial assumes a certain level of familiarity with R and statistical concepts, which might be challenging for beginners. It would be helpful to provide more explanations or links to resources for beginners.\nThe code chunks are quite complex and could be broken down into smaller parts with more explanations.\nThe tutorial could benefit from more visualizations to illustrate the concepts and results.\nThe challenge section could provide more guidance or hints to help learners complete the task.\n\nSuggestions for Improvement:\n\nProvide more explanations or links to resources for beginners to learn R and statistical concepts.\nBreak down complex code chunks into smaller parts with more explanations.\nInclude more visualizations to illustrate the concepts and results.\nProvide more guidance or hints in the challenge section to help learners complete the task.\nInclude a glossary of terms or a list of R packages used in the tutorial for quick reference." + "objectID": "posts/linelist_v1.0.0/index.html#breaking-changes", + "href": "posts/linelist_v1.0.0/index.html#breaking-changes", + "title": "linelist v1.0.0", + "section": "Breaking changes", + "text": "Breaking changes\n\nIt is no longer possible to use lost_tags_action() within a pipeline. It must now be set as a separate step. This makes the internal code more robust and clarifies what is part of the pipeline versus a global option (@Bisaloo, #79).\nThe select_tags() function is now deprecated to ensure we provide just one clear way to address a given issue and that our “happy path” is clearly signposted (@Bisaloo, #61). If you were using this function, we now recommend using the more explicit two-steps process:\n# Deprecated\nx %>%\n select_tags(\"age\")\n\n# Instead use\nx %>%\n tags_df() %>%\n select(age)\n\n# Or\nx %>%\n select(has_tag(\"age\")) %>%\n tags_df()\nThe custom select.linelist() method has been deprecated as providing a custom [.linelist() is sufficient to ensure compatibility with dplyr::select() default methods, including triggering lost_tags_action() on tag removal (@Bisaloo, #61). A full deletion of this method is not possible at the moment because we want to provide a smooth transition for users that relied on the custom tags argument of the select.linelist() method. It is now recommend instead to use the new has_tag() selection helper:\nx %>%\n dplyr::select(has_tag(c(\"id\", \"date_of_onset\")))\n\n# Instead of\nx %>%\n select(tags = c(\"id\", \"date_of_onset\"))\nThe custom rename.linelist() method has been removed as providing a custom names<-().linelist method is sufficient to ensure compatibility with dplyr::rename(), including appropriate modification of the tags. (@Bisaloo, #60)" }, { - "objectID": "posts/ai-learner-review/index.html#overcoming-feedback-bottlenecks", - "href": "posts/ai-learner-review/index.html#overcoming-feedback-bottlenecks", - "title": "Using LLM agents to review tutorials ‘in character’ as learners", - "section": "Overcoming feedback bottlenecks", - "text": "Overcoming feedback bottlenecks\nA challenge with LLMs trained for general use is finding domain-specific tasks where they can add sufficient value beyond existing human input. Tasks like providing early sense checking and tailored feedback, particularly from differing perspectives, therefore has potential to overcome common bottlenecks in developing training materials (e.g. providing initial comments and flagging obvious issues while waiting for more detailed human feedback).\nAs Epiverse-TRACE training materials continue to develop, we plan to explore further uses beyond simple first-pass reviews. For example, LLMs are well suited to synthesising qualitative feedback, increasing the range of insights that can be collected and summarised from learners as we move into beta testing. We also hope to identify opportunities where LLMs can help supplement the learner experience, as demonstrated by emerging tools like RTutor for descriptive plotting functionality in R, which combines generation of code in response to user queries with translation into shiny outputs." + "objectID": "posts/linelist_v1.0.0/index.html#documentation", + "href": "posts/linelist_v1.0.0/index.html#documentation", + "title": "linelist v1.0.0", + "section": "Documentation", + "text": "Documentation\n\nadded a hex logo thanks to David Mascarina’s contribution (@dgmascarina)\nadded short lay description to README thanks to Emma Marty’s contribution" }, { - "objectID": "posts/data-licensing.html", - "href": "posts/data-licensing.html", - "title": "Dual licensing R packages with code and data", - "section": "", - "text": "Licenses are an important topic within open source. Without licenses, information or code can be publicly available but not legally available for reuse or redistribution. The open source software community’s most common licenses are the MIT license or the GNU GPLv3.\nWhen you read the MIT or GNU license, you can see they are rather specific:\nand\nThey aim to cover primarily software, not other forms of information such as, for example, data." + "objectID": "posts/linelist_v1.0.0/index.html#bug-fixes", + "href": "posts/linelist_v1.0.0/index.html#bug-fixes", + "title": "linelist v1.0.0", + "section": "Bug fixes", + "text": "Bug fixes\n\nlinelist is now explicitly marked as incompatible with data.table. In practice, make_linelist(x) now errors if x inherits from data.table (#55, @Bisaloo, based on discussions with @TimTaylor).\n[.linelist() now works to subset by column when including just one argument (#54, @Bisaloo). E.g., x[1]. As an indirect effect, this also improves compatibility with dplyr verbs that rely on this method (#51).\nsubsetting a linelist with extra tags (e.g., created via make_linelist(allow_extra = TRUE)) no longer causes an error (#65, @Bisaloo; reported by @TimTaylor in #63)" }, { - "objectID": "posts/data-licensing.html#footnotes", - "href": "posts/data-licensing.html#footnotes", - "title": "Dual licensing R packages with code and data", - "section": "Footnotes", - "text": "Footnotes\n\n\nFor a full list of license accepted by CRAN see: https://svn.r-project.org/R/trunk/share/licenses/license.db and they also accept stating the license as “Unlimited” for unrestricted distribution.↩︎\nA discussion of data and R packages can be found here: https://github.com/ropensci/unconf17/issues/61. This thread is used as an example of some thoughts on packaging data in R but we acknowledge it is from 2017 so the opinions of the individuals in this thread may have changed.↩︎\nTo see the legal definition of the database right in Europe, and what constitutes it, see the European Union Directive 96/9/EC↩︎\nSee this blog post by Julia Silge on including external data sets into an R package and rectifying incompatibilities with license↩︎" + "objectID": "posts/linelist_v1.0.0/index.html#internal-changes", + "href": "posts/linelist_v1.0.0/index.html#internal-changes", + "title": "linelist v1.0.0", + "section": "Internal changes", + "text": "Internal changes\n\ntestthat tests now run in parallel (#76, @Bisaloo)\ntestthat tests now warn on partial matching (#76, @Bisaloo)" }, { - "objectID": "posts/superspreading_v0.2.0/index.html", - "href": "posts/superspreading_v0.2.0/index.html", - "title": "superspreading v0.2.0", + "objectID": "posts/simulist_v0.2.0/index.html", + "href": "posts/simulist_v0.2.0/index.html", + "title": "simulist v0.2.0", "section": "", - "text": "We are very excited to announce the release of a new superspreading version v0.2.0. Here is an automatically generated summary of the changes in this version.\nSecond minor release of superspreading. This release enhances functions added in v0.1.0 and adds two new exported functions, and two new vignettes.\nlibrary(superspreading)" + "text": "We are very excited to announce the release of a new simulist version v0.2.0. Here is an automatically generated summary of the changes in this version.\nSecond release of simulist, updates the core simulation model and, as a result, the arguments for sim_*() functions for simulating line list data and/or contact table data exported from simulist are updated. The internal package architecture is also refactored." }, { - "objectID": "posts/superspreading_v0.2.0/index.html#new-features", - "href": "posts/superspreading_v0.2.0/index.html#new-features", - "title": "superspreading v0.2.0", + "objectID": "posts/simulist_v0.2.0/index.html#new-features", + "href": "posts/simulist_v0.2.0/index.html#new-features", + "title": "simulist v0.2.0", "section": "New features", - "text": "New features\n\nA new function (calc_network_R()) to estimate the reproduction number for heterogeneous networks and a vignette outlining use cases for the function from existing epidemiological literature is added (#71).\nprobability_epidemic() and probability_extinct() now have individual-level and population-level control in a joint framework (#70).\nproportion_*() functions can now return proportion columns of the output <data.frame> as numeric when the new argument format_prop is set to FALSE (#72).\nA new design principles vignette to document package development is added (#68).\nAdded a helper function (ic_tbl()) to improve model comparison tables (#65).\nprobability_*() functions now accept dynamic dots ({rlang} is added as a dependency) (#82)." + "text": "New features\n\nExternal\n\ncreate_config() now returns a new element in the list: $network. By default create_config() returns network = \"adjusted\", which assumes the simulation is a random network and samples contacts with an excess degree distribution (see Details in ?create_config()). The network effect can be changed to \"unadjusted\" to switch off the network effect. $network is checked internally (in .sim_network_bp()) and will error if not valid (#60).\nPackage architecture diagram is added to design-principles.Rmd (#66).\nlintr GitHub action workflow (lint-changed-files.yaml) is added to the suite of continuous integration workflows (#68).\nTransmission chain visualisation is added to vis-linelist.Rmd (#70).\n\n\n\nInternal\n\n.sim_network_bp() is added as an internal function and replaces bpmodels::chain_sim() as the core simulation model producing contacted and infected individuals. {bpmodels} is removed as a dependency as a result (#60).\n.sample_names() is added as an internal function to utilise randomNames::randomNames() to produce more unique names than randomNames(..., sample.with.replacement = FALSE).\nRefactor of internal simulation architecture replaced .sim_bp_linelist(), .sim_clinical_linelist() and .sim_contacts_tbl() with .sim_internal() (#66).\nThe sim_utils.R file was renamed to utils.R (#66) and the empty create_linelist.R file was removed (#72).\n.add_date_contact() argument outbreak_start_date is now NULL by default instead of missing (#82).\nRegression tests of sim_*() functions now use snapshot testing for more detailed data checking (#65).\nInternal testing data (testdata) files have been updated, as has the testdata/README.md with instructions (#64)." }, { - "objectID": "posts/superspreading_v0.2.0/index.html#breaking-changes", - "href": "posts/superspreading_v0.2.0/index.html#breaking-changes", - "title": "superspreading v0.2.0", + "objectID": "posts/simulist_v0.2.0/index.html#breaking-changes", + "href": "posts/simulist_v0.2.0/index.html#breaking-changes", + "title": "simulist v0.2.0", "section": "Breaking changes", - "text": "Breaking changes\n\nind_control and pop_control arguments replace control and control_type arguments in probability_contain(); and the argument default for num_init_infect is removed (#70).\nChanged epidist argument to offspring_dist to follow Epiverse style (affects several functions) (#64).\nArgument in proportion_transmission() has been renamed from sim to simulate (#81)." - }, - { - "objectID": "posts/superspreading_v0.2.0/index.html#minor-changes", - "href": "posts/superspreading_v0.2.0/index.html#minor-changes", - "title": "superspreading v0.2.0", - "section": "Minor changes", - "text": "Minor changes\n\nNew package hex logo (#73).\nUpdated continuous integration and package infrastructure (#67).\nImproved function documentation (#63).\nOptimisation now uses optim() by default (#82).\nTesting suite now uses snapshot testing for regression tests (#84)." + "text": "Breaking changes\n\nR and serial_interval arguments have been removed from sim_linelist(), sim_contacts() and sim_outbreak() functions and instead contact_distribution, contact_interval and prob_infect are used to parameterise the simulation. Documentation, both functions and vignettes, have been updated with these changes (#60).\nThe contact_distribution argument in sim_*() functions requires a density function if supplied as an anonymous function. Information is added to simulist.Rmd to explain this.\nThe line list output from sim_linelist() now uses column header sex instead of gender. The contacts table output from sim_contacts() and sim_outbreak() now uses column headers age and sex instead of cnt_age and cnt_gender (#60, #79).\ncontact_distribution is redefined and redocumented as the distribution of contacts per individual, rather than the number of contacts that do not get infected as it was in v0.1.0.\nrow.names for <data.frame>s output by sim_linelist(), sim_contacts() and sim_outbreak() are now sequential from 1:nrows (#63)." }, { - "objectID": "posts/superspreading_v0.2.0/index.html#bug-fixes", - "href": "posts/superspreading_v0.2.0/index.html#bug-fixes", - "title": "superspreading v0.2.0", + "objectID": "posts/simulist_v0.2.0/index.html#bug-fixes", + "href": "posts/simulist_v0.2.0/index.html#bug-fixes", + "title": "simulist v0.2.0", "section": "Bug fixes", - "text": "Bug fixes\n\nNone to {superspreading} functions.\nUpdate {epiparameter} use in vignette and tests (#62)." + "text": "Bug fixes\n\nsim_contacts() now correctly runs with an age-structured population. In the previous version (v0.1.0), sim_contacts() did not call .check_age_df() and as a result the function errored, this is fixed as of PR #81." }, { - "objectID": "posts/superspreading_v0.2.0/index.html#deprecated-and-defunct", - "href": "posts/superspreading_v0.2.0/index.html#deprecated-and-defunct", - "title": "superspreading v0.2.0", + "objectID": "posts/simulist_v0.2.0/index.html#deprecated-and-defunct", + "href": "posts/simulist_v0.2.0/index.html#deprecated-and-defunct", + "title": "simulist v0.2.0", "section": "Deprecated and defunct", "text": "Deprecated and defunct\n\nNone" }, { - "objectID": "posts/lint-rcpp/index.html", - "href": "posts/lint-rcpp/index.html", - "title": "Improving the C++ Code Quality of an Rcpp Package", - "section": "", - "text": "The R package development ecosystem includes packages such as {lintr} and {styler} that can help to check code style, and to format R code.\nHowever, these packages cannot lint or style the C++ code of {Rcpp} packages. This could leave the C++ code of an Rcpp package less clean than the R code, increasing the technical debt already associated with using two languages.\nIn Epiverse-TRACE, we encounter this issue with {finalsize}, and we anticipate the same issue with further epidemic modelling packages that we seek to develop or adapt, such as {fluEvidenceSynthesis}.\nOur use-case is not unique, of course, and other projects could have their own solutions. One such, from which we have borrowed some ideas, is the Apache Arrow project, whose R package also uses a C++ backend (via {cpp11} rather than {Rcpp})." + "objectID": "posts/simulist_v0.2.0/index.html#acknowledgements", + "href": "posts/simulist_v0.2.0/index.html#acknowledgements", + "title": "simulist v0.2.0", + "section": "Acknowledgements", + "text": "Acknowledgements\nMany thanks to the contributors to this release, either from issues, code contributions, reviews or discussions (listed in alphabetical order):\n@adamkucharski, @avallecam, @Bisaloo, @chartgerink, @jamesmbaazam, @pratikunterwegs, and @sbfnk." }, { - "objectID": "posts/lint-rcpp/index.html#use-case", - "href": "posts/lint-rcpp/index.html#use-case", - "title": "Improving the C++ Code Quality of an Rcpp Package", + "objectID": "posts/benchmarking_design_decisions/index.html", + "href": "posts/benchmarking_design_decisions/index.html", + "title": "Using benchmarking to guide the adoption of dependencies in R packages", "section": "", - "text": "The R package development ecosystem includes packages such as {lintr} and {styler} that can help to check code style, and to format R code.\nHowever, these packages cannot lint or style the C++ code of {Rcpp} packages. This could leave the C++ code of an Rcpp package less clean than the R code, increasing the technical debt already associated with using two languages.\nIn Epiverse-TRACE, we encounter this issue with {finalsize}, and we anticipate the same issue with further epidemic modelling packages that we seek to develop or adapt, such as {fluEvidenceSynthesis}.\nOur use-case is not unique, of course, and other projects could have their own solutions. One such, from which we have borrowed some ideas, is the Apache Arrow project, whose R package also uses a C++ backend (via {cpp11} rather than {Rcpp})." - }, - { - "objectID": "posts/lint-rcpp/index.html#choice-of-c-linters", - "href": "posts/lint-rcpp/index.html#choice-of-c-linters", - "title": "Improving the C++ Code Quality of an Rcpp Package", - "section": "Choice of C++ linters", - "text": "Choice of C++ linters\nC++ linters such as clang-tidy stumble when dealing with C++ code in src/, as the clang toolchain attempts to compile it. This does not work for Rcpp packages, as the Rcpp.h header cannot be found — this linking is handled by {Rcpp}.\nFortunately, other C++ linters and code checking tools are available and can be used safely with Rcpp packages.\nWe have chosen to use cpplint and cppcheck for {finalsize}.\n\nCpplint\ncpplint is a tool that checks whether C/C++ files follow Google’s C++ style guide. cpplint is easy to install across platforms, and does not error when it cannot find Rcpp.h.\nImportantly, cpplint can be instructed to not lint the autogenerated RcppExports.cpp file, which follows a different style.\nTo lint all other .cpp files, we simply run cpplint from the terminal.\ncpplint --exclude=\"src/RcppExports.cpp\" src/*.cpp\n\n\nCppcheck\ncppcheck is a static code analysis tool, that aims to “have very few false positives”. This is especially useful for the non-standard organisation of Rcpp projects compared to C++ projects.\ncppcheck can also be run locally and instructed to ignore the autogenerated RcppExports.cpp file, while throwing up issues with style.\ncppcheck -i src/RcppExports.cpp --enable=style --error-exitcode=1 src\nHere, the --enable=style option lets cppcheck flag issues with style, acting as a second linter. This enables the performance and portability flags as well. (We have not found any difference when using --enable=warning instead.)\nEnabling all checks (--enable=all) would flag two specific issues for {Rcpp} packages: (1) the Rcpp*.h headers not being found (of the class missingIncludeSystem), and (2) the solver functions not being used by any other C++ function (unusedFunction).\nThese extra options should be avoided in {Rcpp} packages, as the linking is handled for us, and the functions are indeed used later — just not by other C++ functions.\nThe --error-exitcode=1 argument returns the integer 1 when an error is found, which is by convention the output for an error." + "text": "library(bench)\nlibrary(cli)\nlibrary(dplyr)\nlibrary(ggplot2)\nR package developers often have to take on dependencies for various reasons. Dependencies are external packages that provide extra functionalities or features to another package (Wickham and Bryan 2023). Quite often, the goal of taking on a dependency is to utilize its rich features and reduce duplicated coding effort. For example, the ggplot2 package is often adopted, instead of R’s inbuilt plotting functions, to make visualizations using its declarative system and implementation of the Grammar of Graphics, and slick automation across the board.\nSometimes, introducing a dependency might slow down existing processes. Hence, it is important to consider the speed of a dependency’s functions before adopting it. Code optimisation is often reserved for heavy processes like data processing steps. However, here, we will focus on lesser optimised processes, and in particular, printing of texts in R.\nFor this demonstration, we will look at a situation where a package developer is considering options for condition signalling beyond the functionalities provided in base R. Condition signalling refers to the process of communicating to a user that an issue has arisen during code execution (Wickham 2019).\nBase R ships with functionalities for condition signalling. There is base::stop() for signalling errors. Errors refer to cases where something fatal happens during code execution so that it halts, for example, when a function is run without specifying the required arguments.\nrnorm()\n\nError in rnorm(): argument \"n\" is missing, with no default\nBase R provides base::warning() for throwing warnings. Warnings are used to handle less fatal issues with code execution that do not need to halt the process. For example, when you compare two vectors of different lengths, R will throw a warning but will still return a result.\nc(1:2) > c(1:3)\n\nWarning in c(1:2) > c(1:3): longer object length is not a multiple of shorter\nobject length\n\n\n[1] FALSE FALSE FALSE\nLastly, there is base::message() for throwing messages. Messages are used to provide useful information about processes. For example, packages throw messages about NAMESPACE conflicts when loaded. This is to inform the user so they know what next steps to take.\nlibrary(data.table)\n\n\nAttaching package: 'data.table'\n\n\nThe following objects are masked from 'package:dplyr':\n\n between, first, last\nThese base R functions are, however, not flexible enough for advanced formatting: colour, progress bar, contextual pluralisation, and so forth. The package developer might want to adopt external dependencies for these flexibilities. The cli package is well designed for this. cli offers many of the advanced formatting features. Here is a quick example (adopted from the help file) using cli::cli_abort(), the equivalent of base::stop().\nn <- \"c(1, 2, 3)\"\ncli_abort(\n c(\n \"{.var n} must be a numeric vector\",\n \"x\" = \"You've supplied a {.cls {class(n)}} vector.\"\n )\n)\n\nError:\n! `n` must be a numeric vector\n✖ You've supplied a <character> vector.\nIn this example, we used inline text formatting to print n as a variable and interpolated code to print the class of n in the error message and all in just 6 lines of code.\nIf this is enough to excite your interests, check out cli’s website and help files, which provide in-depth guides on its myriad features.\nNow, considering all these amazing features that cli offers, what should a package developer consider to adopt it? In this article, we will demonstrate how to use (micro)benchmarking1 to guide the decision through the lense of speed. Benchmarking can be used to check whether adopting cli as a dependency would slow down existing printing processes in undesirable ways.\nHere, we will benchmark cli::cli_warn() against base R’s base::warning() to see if the former has any speed issues. The results of this exercise will provide us with insights on whether its worth taking on cli for its features as a trade-off for slower printing speed." }, { - "objectID": "posts/lint-rcpp/index.html#adding-c-linting-to-ci-workflows", - "href": "posts/lint-rcpp/index.html#adding-c-linting-to-ci-workflows", - "title": "Improving the C++ Code Quality of an Rcpp Package", - "section": "Adding C++ linting to CI workflows", - "text": "Adding C++ linting to CI workflows\nBoth cpplint and cppcheck can be easily added to continuous integration workflows. In Epiverse-TRACE, we use Github Actions. The C++ lint workflow we have implemented looks like this:\non:\n push:\n paths: \"src/**\"\n pull_request:\n branches:\n - \"*\"\n\nname: Cpp-lint-check\n\njobs:\n cpplint:\n runs-on: ubuntu-latest\n steps:\n - uses: actions/checkout@v3\n - uses: actions/setup-python@v2\n - run: pip install cpplint\n - run: cpplint --quiet --exclude=\"src/RcppExports.cpp\" src/*.cpp\n\n cppcheck:\n runs-on: ubuntu-latest\n steps:\n - uses: actions/checkout@v3\n - run: sudo apt-get install cppcheck\n - run: cppcheck -i src/RcppExports.cpp --quiet --enable=warning --error-exitcode=1 .\nThe workflow is triggered when there are changes to files in src/, and on all pull requests." + "objectID": "posts/benchmarking_design_decisions/index.html#experiment", + "href": "posts/benchmarking_design_decisions/index.html#experiment", + "title": "Using benchmarking to guide the adoption of dependencies in R packages", + "section": "Experiment", + "text": "Experiment\nWe will consider different scenarios of number of warnings thrown to tease out the relationship between speed of the function and number of warnings thrown in a function with and without cli::cli_warn(). We will also check how the benchmark scales with the number of warnings in the function.\nBased on this description, we will need the following packages: cli for its warning function, and bench for measuring the run times, dplyr for manipulating the results, and ggplot2 for plotting.\nLet’s define a function that takes an argument n for the number of warnings to throw and pkg for the type of package to use.\n\nwarn_Ntimes <- function(n, pkg) {\n warning_msg <- \"x must be an integer\"\n switch(pkg,\n base = for (i in seq_len(n)) {\n warning(warning_msg)\n },\n cli = for (i in seq_len(n)) {\n cli_warn(warning_msg)\n }\n )\n}\n\nLet’s test our function to see if it works as expected.\n\nwarn_Ntimes(3, \"base\")\n\nWarning in warn_Ntimes(3, \"base\"): x must be an integer\n\nWarning in warn_Ntimes(3, \"base\"): x must be an integer\n\nWarning in warn_Ntimes(3, \"base\"): x must be an integer\n\nwarn_Ntimes(3, \"cli\")\n\nWarning: x must be an integer\n\n\nWarning: x must be an integer\nx must be an integer\n\n\nNow, we’ll consider scenarios where a function throws 1, 5, 10, 15, 20, and 100 warnings using base R and cli.\n\n# Number of warnings to throw\nwarnings <- c(1, 5, 10, 15, 20, 100)\n\nLet’s run benchmarks over the different scenarios and store results in a data.frame.\n\nbnmark_res <- press(\n warnings = c(1, 5, 10, 15, 20, 100),\n mark(\n cli = warn_Ntimes(warnings, \"cli\"),\n base = warn_Ntimes(warnings, \"base\")\n )\n )" }, { - "objectID": "posts/lint-rcpp/index.html#formatting-c-code", - "href": "posts/lint-rcpp/index.html#formatting-c-code", - "title": "Improving the C++ Code Quality of an Rcpp Package", - "section": "Formatting C++ code", - "text": "Formatting C++ code\nC++ code can be automatically formatted to avoid linter errors. An especially useful tool is clang-format. Our code is styled to follow the Google C++ style guide using:\n# replace .cpp with .h to format headers\nclang-format -i -style=google src/*.cpp\nHowever, this also formats the autogenerated RcppExports.cpp file. It can be extra work to repeatedly undo this change and keep the original formatting, but clang-format does not provide an easy inline way to ignore this file.\nInstead, clang-format can be passed all files except RcppExports.cpp to style using some simple shell commands. In smaller projects, it might be worth\nfind src -name \"*.cpp\" ! -name \"RcppExports.cpp\" -exec clang-format -style=google -i {} \\;" + "objectID": "posts/benchmarking_design_decisions/index.html#results", + "href": "posts/benchmarking_design_decisions/index.html#results", + "title": "Using benchmarking to guide the adoption of dependencies in R packages", + "section": "Results", + "text": "Results\nIt’s time to explore the results of the data generated. Let’s first make a boxplot of the run times for the different scenarios.\nDrum roll please…\n\nautoplot(\n bnmark_res,\n type = \"boxplot\"\n ) +\n labs(\n x = \"Package\",\n caption = \"Functions throwing various numbers of warnings as indicated in the facet label\"\n )\n\nLoading required namespace: tidyr\n\n\n\n\n\n\n\n\n\nNow, let’s see how the benchmarks scale with the number of warnings thrown.\n\nbnmark_res$pkg <- attr(bnmark_res$expression, \"description\")\n\nggplot(bnmark_res) +\n geom_line(\n aes(\n x = warnings,\n y = median,\n color = pkg\n ),\n linewidth = 2\n ) +\n labs(\n x = \"Number of warnigns\",\n y = \"Median execution time\",\n color = \"Package\"\n )\n\n\n\n\n\n\n\n\nAs we can see, cli is consistently slower than base R for the same number of warnings thrown. The median times also follow a similar non-linear trajectory. Benchmarking experiments with other cli functions have revealed similar non-linear relationships between its speed and the number of times it invokes those functions 2. However, those exercises were not compared with equivalent functions in base R. The relative comparison here is useful for decision-making.\nThe developers of cli have also conducted benchmarks of the ansi_*() family of functions in cli in comparison to base R and the fansi package. They find that cli is consistently slower than base R, which corroborates the results of our experiment here. These benchmarks are available in the cli documentation 3.\nSo, should we be worried about the speed of cli? Well, it depends on the context. The “R Packages” book by Hadley Wickham and Jenny Bryan suggests approaching such a decision from a holistic, balanced, and quantitative approach(Wickham and Bryan 2023). We’ll leave the reader to make their own decision based on their use case.\nMost developers might argue that this is an optimisation overkill4. However, it is important to consider speed differences in context. In the case of simple printing, the speed difference is negligible yet disruptive and somewhat painful. However, in the grand scheme of things, this might be nothing compared with much slower processes that need more attention. In those cases, the developer might want to consider other optimisation strategies such as profiling 5. The essence of this experiment is to demonstrate the utility of benchmarking in making quick design decisions." }, { - "objectID": "posts/lint-rcpp/index.html#turning-off-linting-and-formatting", - "href": "posts/lint-rcpp/index.html#turning-off-linting-and-formatting", - "title": "Improving the C++ Code Quality of an Rcpp Package", - "section": "Turning off linting and formatting", - "text": "Turning off linting and formatting\nThere are cases in which we might want to turn linting and formatting off. This might be when the linter does not agree with valid C++ code required in the project, or when the linters and stylers do not agree with each other. These tools are developed separately by large software projects with their own internal requirements, and solutions to issues encountered in their work: clang-format by LLVM (although specifying -style=google), and cpplint from Google’s work.\n\nLinter-enforced paradigms\nSometimes, the linter or styler developer enforces both a style and the use of certain programming paradigms. An example from cpplint is when it warns against passing function arguments by reference, and prefers for these to be passed as pointers, or as constant references (const int &value).\nint some_function(int &value) { \n /* operations modifying value */\n return value;\n}\nPassing the argument as a const reference would not serve the needs of this function, and passing by value is a valid strategy when we don’t want to get into the details of using pointers. (Note that this is typically an issue when large objects such as custom classes or structs are passed to a function multiple times.)\nSimilarly, cpplint will throw a warning about accessing variables using std::move, which is something we encounter in the Newton solver in {finalsize}. While not technically wrong for such a simple use case, the linter is correct to cautiously throw a warning nonetheless.\n\n\nLinter-styler disagreement\nOne example of linter-styler disagreement is the use of BOOST_FOREACH from the Boost libraries as an alternative to for loops. clang-format will insist on adding two spaces before the opening bracket: BOOST_FOREACH (). cpplint will insist on removing one space.\ncpplint and clang-format also disagree on the order of header inclusions, especially when both local and system headers are included.\n\n\nDisabling checks on code chunks\nEither of these cases could require disabling linting or formatting on some part of the code. It is possible to turn off linting using cpplint at particular lines using the comment // NOLINT. Multiple lines can be protected from linting as well.\n// NOLINTBEGIN\n<some C++ code here>\n// NOLINTEND\nAlternatively, clang-format can be instructed to ignore chunks of code using comment messages too.\n// clang-format off\n<some C++ code here>\n// clang-format on" + "objectID": "posts/benchmarking_design_decisions/index.html#conclusion", + "href": "posts/benchmarking_design_decisions/index.html#conclusion", + "title": "Using benchmarking to guide the adoption of dependencies in R packages", + "section": "Conclusion", + "text": "Conclusion\nIn designing R package infrastructure with dependencies, it might sometimes be necessary to check if they don’t slow down existing processes. Here, we have demonstrated how benchmarking is one way to achieve this for a process involving condition signalling. We show how a simple decision to use cli::cli_warn() to handle warnings could come at the cost of a tiny loss in speed, which is worth considering in its context.\nThe demonstration here can be extended to other dependency adoption decisions for input checking, loops, object manipulations, and so forth. We recommend benchmarking as a way to help developers make quick design decisions. However, we also recommend that developers consider the context of the optimisation in interpreting the results.\nI would like to thank Pratik Gupte, Joshua Lambert, and Hugo Gruson for their invaluable reviews and suggestions that helped improve this post." }, { - "objectID": "posts/lint-rcpp/index.html#linter-options-for-future-packages", - "href": "posts/lint-rcpp/index.html#linter-options-for-future-packages", - "title": "Improving the C++ Code Quality of an Rcpp Package", - "section": "Linter options for future packages", - "text": "Linter options for future packages\n{finalsize} is a relatively simple {Rcpp} package, with no C/C++ headers, and no C++ tests. However, future Epiverse-TRACE packages could be more similar to {fluEvidenceSynthesis}, and will have header files, and could also have C++ unit tests via the catch framework.\ncpplint will demand that all local headers be prefixed with their directory (src/), but this would cause the code to break as {Rcpp} looks for a subdirectory called src/src/. This can be turned off by passing the filter option --filter=\"-build/include_subdir\" to cpplint. Alternatively, we could place headers in a subdirectory such as inst/include.\nBoth cpplint and cppcheck can be instructed to ignore C++ test files using the catch testing framework provided by {testthat}. This prevents errors due to the specialised syntax provided by {testthat} in testthat.h, such as context.\n# for cpplint, add an extra exclude statement\ncpplint <...> --exclude=\"src/test*.cpp\" src/*.cpp\n\n# for cppcheck, suppress checks on test files\ncppcheck <...> --suppress=*:src/test_*.cpp src" + "objectID": "posts/benchmarking_design_decisions/index.html#other-r-packages-for-benchmarking", + "href": "posts/benchmarking_design_decisions/index.html#other-r-packages-for-benchmarking", + "title": "Using benchmarking to guide the adoption of dependencies in R packages", + "section": "Other R packages for benchmarking", + "text": "Other R packages for benchmarking\n\nmicrobenchmark: an R package for comparing the execution time of R expressions.\nrbenchmark: an R package for benchmarking R code.\ntictok: an R package to time R functions\ntouchstone: an R package for benchmarking of pull requests with statistical confidence." }, { - "objectID": "posts/lint-rcpp/index.html#conclusion", - "href": "posts/lint-rcpp/index.html#conclusion", - "title": "Improving the C++ Code Quality of an Rcpp Package", - "section": "Conclusion", - "text": "Conclusion\nIt is actually somewhat surprising that there does not seem to be a canonical linter for C++ code in {Rcpp} packages. The methods laid out here are an initial implementation developed for use with the {finalsize} package, and the considerations here are a starting point. We shall be continuously evaluating how we ensure the quality of our C++ code as we encounter more use cases while developing future Epiverse-TRACE packages." + "objectID": "posts/benchmarking_design_decisions/index.html#footnotes", + "href": "posts/benchmarking_design_decisions/index.html#footnotes", + "title": "Using benchmarking to guide the adoption of dependencies in R packages", + "section": "Footnotes", + "text": "Footnotes\n\n\nBenchmark (Wikipedia): https://en.wikipedia.org/wiki/Benchmark_(computing)↩︎\nFormating errors can be slow/variable: https://github.com/r-lib/cli/issues/617↩︎\ncli benchmarks: https://cli.r-lib.org/articles/ansi-benchmark.html↩︎\nDonald Knuth’s quoted as having said, “The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.” (See https://en.wikiquote.org/wiki/Donald_Knuth)↩︎\nProfiling (Wikipedia): https://csgillespie.github.io/efficientR/performance.html↩︎" }, { - "objectID": "posts/extend-dataframes/index.html", - "href": "posts/extend-dataframes/index.html", - "title": "Extending Data Frames", + "objectID": "posts/copilot-workspace/index.html", + "href": "posts/copilot-workspace/index.html", + "title": "Some thoughts after a trial run of GitHub’s Copilot Workspace", "section": "", - "text": "R is a commonly used language for data science and statistical computing. Foundational to this is having data structures that allow manipulation of data with minimal effort and cognitive load. One of the most commonly required data structures is tabular data. This can be represented in R in a few ways, for example a matrix or a data frame. The data frame (class data.frame) is a flexible tabular data structure, as it can hold different data types (e.g. numbers, character strings, etc.) across different columns. This is in contrast to matrices – which are arrays with dimensions – and thus can only hold a single data type.\n\n# data frame can hold heterogeneous data types across different columns\ndata.frame(a = c(1, 2, 3), b = c(4, 5, 6), c = c(\"a\", \"b\", \"c\"))\n\n a b c\n1 1 4 a\n2 2 5 b\n3 3 6 c\n\n# each column must be of the same type\ndf <- data.frame(a = c(1, 2, 3), b = c(\"4\", 5, 6))\n# be careful of the silent type conversion\ndf$a\n\n[1] 1 2 3\n\ndf$b\n\n[1] \"4\" \"5\" \"6\"\n\nmat <- matrix(1:9, nrow = 3, ncol = 3)\nmat\n\n [,1] [,2] [,3]\n[1,] 1 4 7\n[2,] 2 5 8\n[3,] 3 6 9\n\nmat[1, 1] <- \"1\"\n# be careful of the silent type conversion\nmat\n\n [,1] [,2] [,3]\n[1,] \"1\" \"4\" \"7\" \n[2,] \"2\" \"5\" \"8\" \n[3,] \"3\" \"6\" \"9\" \n\n\nData frames can even be nested, cells can be data frames or lists.\n\ndf <- data.frame(a = \"w\", b = \"x\")\ndf[1, 1][[1]] <- list(c = c(\"y\", \"z\"))\ndf\n\n a b\n1 y, z x\n\ndf <- data.frame(a = \"w\", b = \"x\")\ndf[1, 1][[1]] <- list(data.frame(c = \"y\", d = \"z\"))\ndf\n\n a b\n1 y, z x\n\n\nIt is therefore clear why data frames are so prevalent. However, they are not without limitations. They have a relatively basic printing method which can fload the R console when the number of columns or rows is large. They have useful methods (e.g., summary() and str()), but these might not be appropriate for certain types of tabular data. In these cases it is useful to utilise R’s inheritance mechanisms (specifically S3 inheritance) to write extensions for R’s data.frame class. In this case the data frame is the superclass and the new subclass extends it and inherits its methods (see the Advanced R book for more details on S3 inheritance).\nOne of the most common extension of the data frame is the tibble from the {tibble} R package. Outlined in {tibble}’s vignette, tibbles offer improvements in printing, subsetting and recycling rules. Another commonly used data frame extension is the data.table class from the {data.table} R package. In addition to the improved printing, this class is designed to improve the performance (i.e. speed and efficiency of operations and storage) of working with tabular data in R and provide a terse syntax for manipulation.\nIn the process of developing R software (most likely an R package), a new tabular data class that builds atop data frames can become beneficial. This blog post has two main sections:\n\na brief overview of the steps required to setup a class that extends data frames\nguide to the technical aspects of class invariants (required data members of a class) and design and implementation decisions, and tidyverse compatibility\n\n\n\nIt is useful to write a class constructor function that can be called to create an object of your new class. The functions defined below are a redacted version (for readability) of functions available in the {ExtendDataFrames} R package, which contains example functions and files discussed in this post. When assigning the class name ensure that it is a vector containing \"data.frame\" as the last element to correctly inherit properties and methods from the data.frame class.\nbirthdays <- function(x) {\n # the vector of classes is required for it to inherit from `data.frame`\n structure(x, class = c(\"birthdays\", \"data.frame\"))\n}\nThat’s all that’s needed to create a subclass of a data frame. However, although we’ve created the class we haven’t given it any functionality and thus it will be identical to a data frame due to inheritance.\nWe can now write as many methods as we want. Here we will show two methods, one of which does not require writing a generic (print.birthdays) and the second that does (birthdays_per_month). The print() generic function is provided by R, which is why we do not need to add one ourselves. See Adv R and this Epiverse blog post to find out more about S3 generics.\nprint.birthdays <- function(x, ...) {\n cat(\n sprintf(\n \"A `birthdays` object with %s rows and %s cols\",\n dim(x)[1], dim(x)[2]\n )\n )\n invisible(x)\n}\n\nbirthdays_per_month <- function(x, ...) {\n UseMethod(\"birthdays_per_month\")\n}\n\nbirthdays_per_month.birthdays <- function(x, ...) {\n out <- table(lubridate::month(x$birthday))\n months <- c(\n \"Jan\", \"Feb\", \"Mar\", \"Apr\", \"May\", \"Jun\",\n \"Jul\", \"Aug\", \"Sep\", \"Oct\", \"Nov\", \"Dec\"\n )\n names(out) <- months[as.numeric(names(out))]\n return(out)\n}\n\n\n\n\n\n\nTip\n\n\n\nUseful resources for the “Writing custom data class” section: extending tibbles and their functionality\n\n\n\n\n\nWe will now move on to the second section of the post, in which we discuss the design choices when creating and using S3 classes in R. Class invariants are members of your class that define it. In other words, without these elements your class does not fulfil its basic definition. It is therefore sensible to make sure that your class contains these elements at all times (or at least after operations have been applied to your class). In cases when the class object contains all the invariants normal service can be continued. However, in the case that an invariant is missing or modified to a non-conformist type (e.g. a date converted to a numeric) a decision has to be made. Either the code can error, hopefully giving the user an informative message as to why their modification broke the object; alternatively, the subclass can be revoked and the superclass can be returned. In almost all cases the superclass (i.e. the base class being inherited from) is more general and won’t have the same class invariant restrictions.\nFor our example class, <birthdays>, the invariants are a column called name which must contain characters, and a column called birthday which must contain dates. The order of the rows and columns is not considered an invariant property, and having extra columns with other names and data types is also allowed. The number of rows is also not an invariant as we can have as many birthdays as we like in the data object.\nHere we present both cases as well as considerations and technical details of both options. We’ll demonstrate both of these cases with the subset function in R (subsetting uses a single square bracket for tabular data, [). First the fail-on-subsetting. Before we write the subsetting function it is useful to have a function that checks that an object of our class is valid, a so-called validator function.\n\nvalidate_birthdays <- function(x) {\n stopifnot(\n \"input must contain 'name' and 'birthday' columns\" =\n all(c(\"name\", \"birthday\") %in% colnames(x)),\n \"names must be a character\" =\n is.character(x$name),\n \"birthday must be a date\" =\n lubridate::is.Date(x$birthday)\n )\n invisible(x)\n}\n\nThis will return an error if the class is not valid (defined in terms of the class’ invariants).\nNow we can show how to error if one of the invariants are removed during subsetting. See ?NextMethod() for information on method dispatch.\n\n`[.birthdays` <- function(x) {\n validate_birthdays(NextMethod())\n}\n\nbirthdays[, -1]\n# Error in validate_birthdays(NextMethod()) :\n# input must contain 'name' and 'birthday' columns\n\nThe second design option is the reconstruct-on-subsetting. This checks whether the class is valid, and if not downgrade the class to the superclass, in our case a data frame. This is done by not only validating the object during subsetting but to check whether it is a valid class object, and then either ensuring all of the attributes of the subclass – in our case <birthdays> – are maintained, or attributes are stripped and only the attributes of the base superclass – in our case data.frame – are kept.\n\n\n\n\n\n\nNote\n\n\n\nImportant note: this section of the post relies heavily on https://github.com/DavisVaughan/2020-06-01_dplyr-vctrs-compat.\n\n\nThe four functions that are required to be added to ensure our class is correctly handled when invaliding it are:\n\nbirthdays_reconstruct()\nbirthdays_can_reconstruct()\ndf_reconstruct()\ndplyr_reconstruct.birthdays()\n\nWe’ll tackle the first three first, and then move onto to the last one as this requires some extra steps.\nbirthdays_reconstruct() is a function that contains an if-else statement to determine whether the returned object is a <birthdays> or data.frame object.\n\nbirthdays_reconstruct <- function(x, to) {\n if (birthdays_can_reconstruct(x)) {\n df_reconstruct(x, to)\n } else {\n x <- as.data.frame(x)\n message(\"Removing crucial column in `<birthdays>` returning `<data.frame>`\")\n x\n }\n}\n\nThe if-else evaluation is controlled by birthdays_can_reconstruct(). This function determines whether after subsetting the object is a valid <birthdays> class. It checks whether the validator fails, in which case it returns FALSE, otherwise the function will return TRUE.\n\nbirthdays_can_reconstruct <- function(x) {\n # check whether input is valid\n valid <- tryCatch(\n { validate_birthdays(x) },\n error = function(cnd) FALSE\n )\n\n # return boolean\n !isFALSE(valid)\n}\n\nThe next function required is df_reconstruct(). This is called when the object is judged to be a valid <birthdays> object and simply copies the attributes over from the <birthdays> class to the object being subset.\n\ndf_reconstruct <- function(x, to) {\n attrs <- attributes(to)\n attrs$names <- names(x)\n attrs$row.names <- .row_names_info(x, type = 0L)\n attributes(x) <- attrs\n x\n}\n\nThe three functions defined for reconstruction can be added to a package with the subsetting function in order to subset <birthdays> objects and returning either <birthdays> objects if still valid, or data frames when invalidated. This design has the benefit that when conducting data exploration a user is not faced with an error, but can continue with a data frame, while being informed by the message printed to console in birthdays_reconstruct().\n\n`[.birthdays` <- function(x, ...) {\n out <- NextMethod()\n birthdays_reconstruct(out, x)\n}\n\n\n\n\n\nlibrary(dplyr)\n\nIn order to be able to operate on our <birthdays> class using functions from the package {dplyr}, as would be common for data frames, we need to make our function compatible. This is where the function dplyr_reconstruct.birthdays() comes in. dplyr_reconstruct() is a generic function exported by {dplyr}. It is called in {dplyr} verbs to make sure that the objects are restored to the input class when not invalidated.\n\ndplyr_reconstruct.birthdays <- function(data, template) {\n birthdays_reconstruct(data, template)\n}\n\nInformation about the generic can be found through the {dplyr} help documentation.\n\n?dplyr::dplyr_extending\n?dplyr::dplyr_reconstruct\n\nAs explained in the help documentation, {dplyr} also uses two base R functions to perform data manipulation. names<- (i.e the names setter function) and [ the one-dimensional subsetting function. We therefore define these methods for our custom class in order for dplyr_reconstruct() to work as intended.\n\n`[.birthdays` <- function(x, ...) {\n out <- NextMethod()\n birthdays_reconstruct(out, x)\n}\n\n`names<-.birthdays` <- function(x, value) {\n out <- NextMethod()\n birthdays_reconstruct(out, x)\n}\n\nThis wraps up the need for adding function to perform data manipulation using the reconstruction design outlined above.\nHowever, there is some final housekeeping to do. In cases when {dplyr} is not a package dependency (either imported or suggested), then the S3 generic dplyr_reconstruct() is required to be loaded. In R versions before 3.6.0 – this also works for R versions later than 3.6.0 – the generic function needs to be registered. This is done by writing an .onLoad() function, typically in a file called zzz.R. This is included in the {ExtendDataFrames} package for illustrative purposes.\n\n\n\nzzz.R\n\n.onLoad <- function(libname, pkgname) {\n s3_register(\"dplyr::dplyr_reconstruct\", \"birthdays\")\n invisible()\n}\n\nThe s3_register() function used in .onLoad() also needs to be added to the package and this function is kindly supplied by both {vctrs} and {rlang} unlicensed and thus can be copied into another package. See the R packages book for information about .onLoad() and attaching and loading in general.\nSince R version 3.6.0 this S3 generic registration happens automatically with S3Method() in the package namespace using the {roxygen2} documentation #' @exportS3Method dplyr::dplyr_reconstruct.\nThere is one last option which prevents the hard dependency on a relatively recent R version. Since {roxygen2} version 6.1.0, there is the @rawNamespace tag which allows insertion of text into the NAMESPACE file. Using this tag the following code will check the local R version and register the S3 method if equal to or above 3.6.0.\n\n#' @rawNamespace if (getRversion() >= \"3.6.0\") {\n#' S3method(pkg::fun, class)\n#' }\n\nEach of the three options for registering S3 methods has different benefits and downsides, so the choice depends on the specific use-case. Over time it may be best to use the most up-to-date methods as packages are usually only maintained for a handful of recent R releases1.\nThe topics discussed in this post have been implemented in the {epiparameter} R package within Epiverse-TRACE.\nCompatibility with {vctrs} is also possible using the same mechanism (functions) described in this post, and if interested see https://github.com/DavisVaughan/2020-06-01_dplyr-vctrs-compat for details.\nFor other use-cases and discussions of the designs and implementations discussed in this post see:\n\n{dials} R package\n{rsample} R package\n{googledrive} R package\nPull request on {tibble} R package\n\nThis blog post is a compendium of information from sources that are linked and cited throughout. Please refer to those sites for more information and as the primary source for citation in further work." + "text": "GitHub recently previewed ‘Copilot Workspace’, which aims to use generative AI to assist software developers. Rather than just giving in-line suggestions, as GitHub copilot does, workspace allows users to map out and execute entire projects. We got early preview access to the tool, so decided to see how well it performed for our needs in Epiverse-TRACE.\nIn the Epiverse-TRACE initiative we aim to develop an ecosystem of epidemic analytics tools as Digital Public Goods in the form of R packages. We aim to follow best-practices given that our tools will be used to inform public health decisions. For efficiency, we use a range of development tools, including integrated development environments (IDE), such as RStudio and Visual Studio Code, other R packages to help check and document our code, for example {lintr} and {roxygen2}, and version control and code sharing tools, git and GitHub.\nGiven the rapid development and increasing availability to artificial intelligence (AI) models, specifically large language models, offer the potential to be another development tool to add to the arsenal of researchers and software engineers to enhance productivity and improve the reliability of the code we write and others depend upon (Hoek et al. 2024). Large language models like ChatGPT from OpenAI, Gemini from Google, Claude from Anthropic, as well as many others and new ones frequently appearing, have presented opportunities to quickly generate material – text, code and images – from text prompts.\nA small group of Epiverse-TRACE professors and research software engineers decided to explore the suitability of one such, GitHub Copilot integrated with GitHub Workspaces, for some typical use cases encountered in our day-to-day work. We do note that many other such tools are available, and we chose Copilot in Workspaces as our code is hosted on GitHub, making for easy evaluation of the specific contribution of the AI tool. However, some of our broad conclusions are based on using other tools for similar tasks.\nThe aim of this blog post is to summarise our experiences with Copilot and Workspaces, and share our thoughts more generally about how the current feature set of AI-powered development tools could help with our daily development tasks, and with research software development at large. We evaluated Copilot workspace across three tasks of varying difficulty and requirements:" }, { - "objectID": "posts/extend-dataframes/index.html#extending-data-frames-in-r", - "href": "posts/extend-dataframes/index.html#extending-data-frames-in-r", - "title": "Extending Data Frames", - "section": "", - "text": "R is a commonly used language for data science and statistical computing. Foundational to this is having data structures that allow manipulation of data with minimal effort and cognitive load. One of the most commonly required data structures is tabular data. This can be represented in R in a few ways, for example a matrix or a data frame. The data frame (class data.frame) is a flexible tabular data structure, as it can hold different data types (e.g. numbers, character strings, etc.) across different columns. This is in contrast to matrices – which are arrays with dimensions – and thus can only hold a single data type.\n\n# data frame can hold heterogeneous data types across different columns\ndata.frame(a = c(1, 2, 3), b = c(4, 5, 6), c = c(\"a\", \"b\", \"c\"))\n\n a b c\n1 1 4 a\n2 2 5 b\n3 3 6 c\n\n# each column must be of the same type\ndf <- data.frame(a = c(1, 2, 3), b = c(\"4\", 5, 6))\n# be careful of the silent type conversion\ndf$a\n\n[1] 1 2 3\n\ndf$b\n\n[1] \"4\" \"5\" \"6\"\n\nmat <- matrix(1:9, nrow = 3, ncol = 3)\nmat\n\n [,1] [,2] [,3]\n[1,] 1 4 7\n[2,] 2 5 8\n[3,] 3 6 9\n\nmat[1, 1] <- \"1\"\n# be careful of the silent type conversion\nmat\n\n [,1] [,2] [,3]\n[1,] \"1\" \"4\" \"7\" \n[2,] \"2\" \"5\" \"8\" \n[3,] \"3\" \"6\" \"9\" \n\n\nData frames can even be nested, cells can be data frames or lists.\n\ndf <- data.frame(a = \"w\", b = \"x\")\ndf[1, 1][[1]] <- list(c = c(\"y\", \"z\"))\ndf\n\n a b\n1 y, z x\n\ndf <- data.frame(a = \"w\", b = \"x\")\ndf[1, 1][[1]] <- list(data.frame(c = \"y\", d = \"z\"))\ndf\n\n a b\n1 y, z x\n\n\nIt is therefore clear why data frames are so prevalent. However, they are not without limitations. They have a relatively basic printing method which can fload the R console when the number of columns or rows is large. They have useful methods (e.g., summary() and str()), but these might not be appropriate for certain types of tabular data. In these cases it is useful to utilise R’s inheritance mechanisms (specifically S3 inheritance) to write extensions for R’s data.frame class. In this case the data frame is the superclass and the new subclass extends it and inherits its methods (see the Advanced R book for more details on S3 inheritance).\nOne of the most common extension of the data frame is the tibble from the {tibble} R package. Outlined in {tibble}’s vignette, tibbles offer improvements in printing, subsetting and recycling rules. Another commonly used data frame extension is the data.table class from the {data.table} R package. In addition to the improved printing, this class is designed to improve the performance (i.e. speed and efficiency of operations and storage) of working with tabular data in R and provide a terse syntax for manipulation.\nIn the process of developing R software (most likely an R package), a new tabular data class that builds atop data frames can become beneficial. This blog post has two main sections:\n\na brief overview of the steps required to setup a class that extends data frames\nguide to the technical aspects of class invariants (required data members of a class) and design and implementation decisions, and tidyverse compatibility\n\n\n\nIt is useful to write a class constructor function that can be called to create an object of your new class. The functions defined below are a redacted version (for readability) of functions available in the {ExtendDataFrames} R package, which contains example functions and files discussed in this post. When assigning the class name ensure that it is a vector containing \"data.frame\" as the last element to correctly inherit properties and methods from the data.frame class.\nbirthdays <- function(x) {\n # the vector of classes is required for it to inherit from `data.frame`\n structure(x, class = c(\"birthdays\", \"data.frame\"))\n}\nThat’s all that’s needed to create a subclass of a data frame. However, although we’ve created the class we haven’t given it any functionality and thus it will be identical to a data frame due to inheritance.\nWe can now write as many methods as we want. Here we will show two methods, one of which does not require writing a generic (print.birthdays) and the second that does (birthdays_per_month). The print() generic function is provided by R, which is why we do not need to add one ourselves. See Adv R and this Epiverse blog post to find out more about S3 generics.\nprint.birthdays <- function(x, ...) {\n cat(\n sprintf(\n \"A `birthdays` object with %s rows and %s cols\",\n dim(x)[1], dim(x)[2]\n )\n )\n invisible(x)\n}\n\nbirthdays_per_month <- function(x, ...) {\n UseMethod(\"birthdays_per_month\")\n}\n\nbirthdays_per_month.birthdays <- function(x, ...) {\n out <- table(lubridate::month(x$birthday))\n months <- c(\n \"Jan\", \"Feb\", \"Mar\", \"Apr\", \"May\", \"Jun\",\n \"Jul\", \"Aug\", \"Sep\", \"Oct\", \"Nov\", \"Dec\"\n )\n names(out) <- months[as.numeric(names(out))]\n return(out)\n}\n\n\n\n\n\n\nTip\n\n\n\nUseful resources for the “Writing custom data class” section: extending tibbles and their functionality\n\n\n\n\n\nWe will now move on to the second section of the post, in which we discuss the design choices when creating and using S3 classes in R. Class invariants are members of your class that define it. In other words, without these elements your class does not fulfil its basic definition. It is therefore sensible to make sure that your class contains these elements at all times (or at least after operations have been applied to your class). In cases when the class object contains all the invariants normal service can be continued. However, in the case that an invariant is missing or modified to a non-conformist type (e.g. a date converted to a numeric) a decision has to be made. Either the code can error, hopefully giving the user an informative message as to why their modification broke the object; alternatively, the subclass can be revoked and the superclass can be returned. In almost all cases the superclass (i.e. the base class being inherited from) is more general and won’t have the same class invariant restrictions.\nFor our example class, <birthdays>, the invariants are a column called name which must contain characters, and a column called birthday which must contain dates. The order of the rows and columns is not considered an invariant property, and having extra columns with other names and data types is also allowed. The number of rows is also not an invariant as we can have as many birthdays as we like in the data object.\nHere we present both cases as well as considerations and technical details of both options. We’ll demonstrate both of these cases with the subset function in R (subsetting uses a single square bracket for tabular data, [). First the fail-on-subsetting. Before we write the subsetting function it is useful to have a function that checks that an object of our class is valid, a so-called validator function.\n\nvalidate_birthdays <- function(x) {\n stopifnot(\n \"input must contain 'name' and 'birthday' columns\" =\n all(c(\"name\", \"birthday\") %in% colnames(x)),\n \"names must be a character\" =\n is.character(x$name),\n \"birthday must be a date\" =\n lubridate::is.Date(x$birthday)\n )\n invisible(x)\n}\n\nThis will return an error if the class is not valid (defined in terms of the class’ invariants).\nNow we can show how to error if one of the invariants are removed during subsetting. See ?NextMethod() for information on method dispatch.\n\n`[.birthdays` <- function(x) {\n validate_birthdays(NextMethod())\n}\n\nbirthdays[, -1]\n# Error in validate_birthdays(NextMethod()) :\n# input must contain 'name' and 'birthday' columns\n\nThe second design option is the reconstruct-on-subsetting. This checks whether the class is valid, and if not downgrade the class to the superclass, in our case a data frame. This is done by not only validating the object during subsetting but to check whether it is a valid class object, and then either ensuring all of the attributes of the subclass – in our case <birthdays> – are maintained, or attributes are stripped and only the attributes of the base superclass – in our case data.frame – are kept.\n\n\n\n\n\n\nNote\n\n\n\nImportant note: this section of the post relies heavily on https://github.com/DavisVaughan/2020-06-01_dplyr-vctrs-compat.\n\n\nThe four functions that are required to be added to ensure our class is correctly handled when invaliding it are:\n\nbirthdays_reconstruct()\nbirthdays_can_reconstruct()\ndf_reconstruct()\ndplyr_reconstruct.birthdays()\n\nWe’ll tackle the first three first, and then move onto to the last one as this requires some extra steps.\nbirthdays_reconstruct() is a function that contains an if-else statement to determine whether the returned object is a <birthdays> or data.frame object.\n\nbirthdays_reconstruct <- function(x, to) {\n if (birthdays_can_reconstruct(x)) {\n df_reconstruct(x, to)\n } else {\n x <- as.data.frame(x)\n message(\"Removing crucial column in `<birthdays>` returning `<data.frame>`\")\n x\n }\n}\n\nThe if-else evaluation is controlled by birthdays_can_reconstruct(). This function determines whether after subsetting the object is a valid <birthdays> class. It checks whether the validator fails, in which case it returns FALSE, otherwise the function will return TRUE.\n\nbirthdays_can_reconstruct <- function(x) {\n # check whether input is valid\n valid <- tryCatch(\n { validate_birthdays(x) },\n error = function(cnd) FALSE\n )\n\n # return boolean\n !isFALSE(valid)\n}\n\nThe next function required is df_reconstruct(). This is called when the object is judged to be a valid <birthdays> object and simply copies the attributes over from the <birthdays> class to the object being subset.\n\ndf_reconstruct <- function(x, to) {\n attrs <- attributes(to)\n attrs$names <- names(x)\n attrs$row.names <- .row_names_info(x, type = 0L)\n attributes(x) <- attrs\n x\n}\n\nThe three functions defined for reconstruction can be added to a package with the subsetting function in order to subset <birthdays> objects and returning either <birthdays> objects if still valid, or data frames when invalidated. This design has the benefit that when conducting data exploration a user is not faced with an error, but can continue with a data frame, while being informed by the message printed to console in birthdays_reconstruct().\n\n`[.birthdays` <- function(x, ...) {\n out <- NextMethod()\n birthdays_reconstruct(out, x)\n}\n\n\n\n\n\nlibrary(dplyr)\n\nIn order to be able to operate on our <birthdays> class using functions from the package {dplyr}, as would be common for data frames, we need to make our function compatible. This is where the function dplyr_reconstruct.birthdays() comes in. dplyr_reconstruct() is a generic function exported by {dplyr}. It is called in {dplyr} verbs to make sure that the objects are restored to the input class when not invalidated.\n\ndplyr_reconstruct.birthdays <- function(data, template) {\n birthdays_reconstruct(data, template)\n}\n\nInformation about the generic can be found through the {dplyr} help documentation.\n\n?dplyr::dplyr_extending\n?dplyr::dplyr_reconstruct\n\nAs explained in the help documentation, {dplyr} also uses two base R functions to perform data manipulation. names<- (i.e the names setter function) and [ the one-dimensional subsetting function. We therefore define these methods for our custom class in order for dplyr_reconstruct() to work as intended.\n\n`[.birthdays` <- function(x, ...) {\n out <- NextMethod()\n birthdays_reconstruct(out, x)\n}\n\n`names<-.birthdays` <- function(x, value) {\n out <- NextMethod()\n birthdays_reconstruct(out, x)\n}\n\nThis wraps up the need for adding function to perform data manipulation using the reconstruction design outlined above.\nHowever, there is some final housekeeping to do. In cases when {dplyr} is not a package dependency (either imported or suggested), then the S3 generic dplyr_reconstruct() is required to be loaded. In R versions before 3.6.0 – this also works for R versions later than 3.6.0 – the generic function needs to be registered. This is done by writing an .onLoad() function, typically in a file called zzz.R. This is included in the {ExtendDataFrames} package for illustrative purposes.\n\n\n\nzzz.R\n\n.onLoad <- function(libname, pkgname) {\n s3_register(\"dplyr::dplyr_reconstruct\", \"birthdays\")\n invisible()\n}\n\nThe s3_register() function used in .onLoad() also needs to be added to the package and this function is kindly supplied by both {vctrs} and {rlang} unlicensed and thus can be copied into another package. See the R packages book for information about .onLoad() and attaching and loading in general.\nSince R version 3.6.0 this S3 generic registration happens automatically with S3Method() in the package namespace using the {roxygen2} documentation #' @exportS3Method dplyr::dplyr_reconstruct.\nThere is one last option which prevents the hard dependency on a relatively recent R version. Since {roxygen2} version 6.1.0, there is the @rawNamespace tag which allows insertion of text into the NAMESPACE file. Using this tag the following code will check the local R version and register the S3 method if equal to or above 3.6.0.\n\n#' @rawNamespace if (getRversion() >= \"3.6.0\") {\n#' S3method(pkg::fun, class)\n#' }\n\nEach of the three options for registering S3 methods has different benefits and downsides, so the choice depends on the specific use-case. Over time it may be best to use the most up-to-date methods as packages are usually only maintained for a handful of recent R releases1.\nThe topics discussed in this post have been implemented in the {epiparameter} R package within Epiverse-TRACE.\nCompatibility with {vctrs} is also possible using the same mechanism (functions) described in this post, and if interested see https://github.com/DavisVaughan/2020-06-01_dplyr-vctrs-compat for details.\nFor other use-cases and discussions of the designs and implementations discussed in this post see:\n\n{dials} R package\n{rsample} R package\n{googledrive} R package\nPull request on {tibble} R package\n\nThis blog post is a compendium of information from sources that are linked and cited throughout. Please refer to those sites for more information and as the primary source for citation in further work." + "objectID": "posts/copilot-workspace/index.html#experiment-1-export-an-r-function-by-updating-the-documentation", + "href": "posts/copilot-workspace/index.html#experiment-1-export-an-r-function-by-updating-the-documentation", + "title": "Some thoughts after a trial run of GitHub’s Copilot Workspace", + "section": "Experiment 1: Export an R function by updating the documentation", + "text": "Experiment 1: Export an R function by updating the documentation\n\n\n\n\n\n\nNote\n\n\n\nDifficulty level: basic\n\n\nThe first experiment was on the {EpiNow2} R package. An existing Issue #681 was chosen. It was a simple task to evaluate Copilot workspace’s ability to interpret requests that require small changes to the codebase of a package (i.e. can be achieved in a single commit by changing less than 5 lines of code). The issue stated that an internal function, epinow2_cmdstan_model() should be exported, with the issue description explaining this would help users specify custom options for running the models.\n\n\n\nScreenshot of EpiNow2 package issue 681 which was used as an example application of GitHub Copilot Workspace.\n\n\n\n\nOutcome\nThe specification provided by Copilot workspace diagnosed the issue requirements, mentioning that the NAMESPACE file, which stores information on functions offered by a package and functions imported from other packages, would have to be updated for the R package to export this currently unexported function. The proposed solution also addressed the desired changes, stating that the function was to be exported and the keyword internal would be removed. The automatically generated plan proposed to edit the file documentation and NAMESPACE, but we edited this to more specifically replace the @keyword internal {roxygen} tag with @export.\n\n\nOur thoughts\nThe implementation offered by GitHub copilot workspace was correct and replaced @keywords internal with @export in the correct function documentation epinow2_stan_model().\nHowever, it also changed the formatting of one of the function arguments (@param model) by moving some text to the next line and added a comma to the last but one item listed in the sentence. (It seems AI has taken a side on the Oxford comma debate).\nOverall, this test case of GitHub copilot workspace was a partial success. It correctly recognised the links between inline R function documentation, Roxygen tags, and the R-specific documentation files in man/. However, it (naturally) does not have a mechanistic understanding of package development and how changing tags in documentation is used to auto-update doc files using development tools. It correctly modified the part of the file needed to export an internal function. This showcases an example of where this technology can be beneficial: those new to a language or paradigm that have not yet picked up all of the development details that are often second nature to experienced developers. In this example, the time taken to make this change in GitHub copilot workspace was likely ~10x longer than what our RSEs would take. However, if someone was used to developing in say, Python, and was new to the conventions of exporting functions in R using {roxygen} documentation, this test case shows how copilot workspace can reliably assist. On the other hand, because these changes make no reference to standard development tools, they would not help new developers learn how or where to use them.\nThe changes made will still require the user to manually run the documentation (using either devtools::document() or roxygen2::roxygenize()) to update the package NAMESPACE for the function to be exported. Our recommendation would be for AI tools to integrate with and call development tools used by developers (analogous to pre-commit hooks running tools before committing) to make sure this is taken care of all in a single generative step." }, { - "objectID": "posts/extend-dataframes/index.html#footnotes", - "href": "posts/extend-dataframes/index.html#footnotes", - "title": "Extending Data Frames", - "section": "Footnotes", - "text": "Footnotes\n\n\nThis is the working practise of tidyverse packages: https://www.tidyverse.org/blog/2019/04/r-version-support/↩︎" + "objectID": "posts/copilot-workspace/index.html#experiment-2-add-a-simple-epidemic-model-to-an-r-package-that-contains-more-complex-models", + "href": "posts/copilot-workspace/index.html#experiment-2-add-a-simple-epidemic-model-to-an-r-package-that-contains-more-complex-models", + "title": "Some thoughts after a trial run of GitHub’s Copilot Workspace", + "section": "Experiment 2: Add a simple epidemic model to an R package that contains more complex models", + "text": "Experiment 2: Add a simple epidemic model to an R package that contains more complex models\n\n\n\n\n\n\nNote\n\n\n\nDifficulty level: moderate to high\n\n\n{epidemics} is an Epiverse-TRACE R package that exports a range of compartmental epidemiological models of varying complexities and applications.\nIn the second test case, we created an issue in the {epidemics} R package requesting a basic SIR model, with the issue description requesting the model to have no age structure and to use existing functionality in the package to specify a disease intervention. Additionally, the issue requested that the model be solved with the {deSolve} R package, which is a differential equation solver in R. A basic SIR model would have been new to the package, but one of several other compartmental models, and simpler than the models already implemented.\n\n\n\nScreenshot of epidemics package issue 238 which was used as an example application of GitHub Copilot Workspace.\n\n\n\nOutcome\nGitHub copilot workspace outlined the existing functionality in the package and proposed changes including adding a new file called R/model_sir.R, which follows the model_*.R naming convention used in {epidemics}. As requested, copilot also proposed that the new SIR model would support the use of interventions that are already been implemented in the package.\n\n\nOur thoughts\nThe code generated was an R function that constructed a basic SIR model, and this was well documented using a {roxygen2} documentation block. However, several aspects of the code generated did not match the proposal generated by copilot workspace, and the code contained inadvisable coding practices in R (i.e. code smells). Firstly the SIR model itself. The model implemented follows the standard set of differential equations that are solved using {deSolve}, as requested in our prompt. However, the model does not have any options to input interventions, which the proposed solution by GitHub copilot workspace suggested it would. The second downside is the use of require(deSolve) in the function body of the generated code. This is bad practice in R package development as functions should namespace any functions that are being called from other packages (e.g. packagename::func_name()).\nThis task required copilot workspace to understand the contents and conventions of the {epidemics} R package and generate a simple compartmental model which would integrate with the infrastructure in the package for defining interventions in models. Although it correctly summarised the package contents, its attempt to answer the specific request from our prompt had many inaccuracies. The generated code is likely influenced by the many examples of SIR models written in R on the internet and on GitHub. These might be R scripts or gists and thus coding practises used there are not always appropriate for writing code to be distributed in an R package, resulting in decisions like having require(deSolve) in the function.\nAI-generated code like that seen in this example showcases where users need to be diligent and not assume that the text descriptions from copilot match the code. In the end, this was a test example and was unlikely to be merged into {epidemics} even if the code was correct. However, with the code generated, it presents almost no use; it would require as much time for an RSE to refactor this function as it would to write one from scratch following the design principles and demands of the {epidemics} package." }, { - "objectID": "posts/100days-workshop/index.html", - "href": "posts/100days-workshop/index.html", - "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", - "section": "", - "text": "This vignette summarises the findings from the 100 days and 100 lines of code workshop, hosted in December 2022 by Epiverse-TRACE.\nTo discuss how the first 100 lines of code during an epidemic should look like, we invited 40 experts, including academics, field epidemiologists, and software engineers, to take part in a 3-day workshop, where they discussed the current challenges, and potential solutions, in data analytic pipelines used to analyse epidemic data. In addition to highlighting existing technical solutions and their use cases, presentations on best practices in fostering collaboration across institutions and disciplines set the scene for the subsequent workshop scenario exercises." + "objectID": "posts/copilot-workspace/index.html#experiment-3-conducting-a-code-review-on-an-r-package", + "href": "posts/copilot-workspace/index.html#experiment-3-conducting-a-code-review-on-an-r-package", + "title": "Some thoughts after a trial run of GitHub’s Copilot Workspace", + "section": "Experiment 3: Conducting a code review on an R package", + "text": "Experiment 3: Conducting a code review on an R package\n\n\n\n\n\n\nNote\n\n\n\nDifficulty level: moderate to high\n\n\nIn the last test case, we requested GitHub copilot workspace to conduct a code review of the entire code base of a package. At Epiverse-TRACE, it is standard practise to conduct a full package review. As any code base develops, including R packages, more commits are added and often the number of lines of code increases. Reviewing all of these changes before releasing a new version of the package is a timely undertaking for package reviewers. If AI tools could review the code base, similar to how static code analysers work, and suggest improvements to, for example, code style, memory efficiency, documentation, unit testing, UX/UI, it would greatly improve efficiency before releasing the code.\n\nOutcome\nHere, Copilot provided no analysis of the code base. It regurgitated the changes described in the package review pull request, likely from a mix of the pull request description and the changelog (NEWS.md file), and did not analyse any of the code. Therefore, it provided no additional use than if the reviewer had just read the NEWS." }, { - "objectID": "posts/100days-workshop/index.html#scenario-1-novel-respiratory-disease-in-the-gambia", - "href": "posts/100days-workshop/index.html#scenario-1-novel-respiratory-disease-in-the-gambia", - "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", - "section": "Scenario 1: Novel respiratory disease in The Gambia", - "text": "Scenario 1: Novel respiratory disease in The Gambia\n\n\n\nScenario 1 details\n\n\n\nAnalytic pipeline for scenario 1 (analysed by group 2)\n\nData cleaning\n\nlinelist to standardise date format\ncleanr from previous Hackathon\n\nDelay distributions\n\nfitdisrplus to fit parameteric distributions to scenario data\nepiparameter to extract delay distributions from respiratory pathogens\nEpiNow2 to fit reporting delays\nEpiEstim / coarseDataTools to estimate generation time/serial interval of disease\nepicontacts\nmixdiff to estimate delay distributions and correct erroneous dates at the same time (still under development)\n\nPopulation demographics\n\nWould like to have had access to an R package similar to ColOpenData\n\nRisk factors of infection\n\nUsed R4epis as a guide on how to create two-way tables and perform Chi-squared tests\n\nSeverity of disease\n\ndatadelay for CFR calculation\nImplementation of method developed by AC Ghani, 2005 to estimate CFR\n\nContact matching\n\ndiyar to match and link records\nfuzzyjoin to join contact and case data despite misspellings or missing cell contents\n\nEpi curve and maps\n\nUsed incidence and incidence2 for incidence calculation and visualisation\nraster to extract spatial information from library of shapefiles\n\nReproduction number\n\nAPEestim\nbayEStim\nearlyR\nepicontacts\nepidemia\nepiFilter\nEpiNow2\nEpiEstim\nR0\noutbreaker2\nUsed this comparison table to choose the most appropriate package.\n\nSuperspreading, by using these resources:\n\nfitdistrplus\nepicontacts\n\nEpidemic projections\n\nincidence R estimation using a loglinear model\nprojections using Rt estimates, SI distributions and overdispersion estimates\n\nTransmission chains and strain characterisation\n\nIQtree and nextclade to build a maximum likelihood tree and mannually inspect it\nAdvanced modelling through phylodynamic methods, using tools like BEAST\n\n\n\n\n\n\n\n\n\nData analysis step\nChallenges\n\n\n\n\nData cleaning\nNot knowing what packages are available for this purpose\n\n\nDelay distributions\nDealing with right truncation\nAccounting for multiple infectors\n\n\nPopulation demographics\nLacking tools that provide information about population by age, gender, etc.\n\n\nRisk factors of infection\nDistinguishing between risk factors vs detecting differences in reporting frequencies among groups\n\n\nSeverity of disease\nKnowing the prevalence of disease (denominator)\nRight truncated data\nVarying severity of different strains\n\n\nContact matching\nMissing data\nMisspellings\n\n\nEpicurve and maps\nNA dates entries not included\nReporting levels varying over time\n\n\nOffspring distribution\nRight truncation\nTime varying reporting efforts\nAssumption of a single homogeneous epidemic\nImportation of cases\n\n\nForecasting\nUnderlying assumption of a given R distribution, e.g., single trend, homogeneous mixing, no saturation" + "objectID": "posts/copilot-workspace/index.html#conclusion", + "href": "posts/copilot-workspace/index.html#conclusion", + "title": "Some thoughts after a trial run of GitHub’s Copilot Workspace", + "section": "Conclusion", + "text": "Conclusion\nBased on our trials we observe that GitHub copilot Workspace may not provide a significant benefit to research software engineers in domain-specific R package development, where problems are complex and bespoke solutions are common. The evidence above suggests there’s still a long way to go before AI-powered tools can produce unsupervised code projects equivalent to that of a research software engineer. One observation is that generative AI is better trained for some purposes than others, biased by the amount of training data/information available. This makes it struggle in niche areas with a relatively smaller public code base to train models with (e.g. R functions for epidemic analysis are much rarer than generic JavaScript functions for web design). A second is that AI models don’t seem to understand how different parts of a codebase link together, so they provide solutions that are inconsistent with the requirements of the codebase (design principles, code style, etc). These models may, however, be more useful for more common tasks, such as building dashboard templates.\nThe test cases we experimented with suggest that this tool will not replace, and at times not even enhance development when the task requires a more integrated development than generating a standalone script. These tools are evolving rapidly and we are sure improvements will continue. In the short term tools like GitHub copilot workspace need to be used responsibly with an awareness of their limitations and not a blind trust that the code it writes is always correct. We would caution against relying on generative AI tools alone to develop software with potentially large-scale impacts - scrutiny by trained developers is still key. Research software engineers will play a key part in this as they will ensure that code and software written with and by AI tools meet required standards, are trustworthy, and are not potentially harmful. Where we see generative AI tools being more immediately useful in research software development is in tasks that require sifting through or summarising vast amounts of information; for example, in querying software documentation, or to help get started with a new programming framework or language.\nAll authors thank Hugo Gruson and Chris Hartgerink for their valuable feedback and contributions to this post." }, { - "objectID": "posts/100days-workshop/index.html#scenario-2-outbreak-of-an-unidentified-disease-in-rural-colombia", - "href": "posts/100days-workshop/index.html#scenario-2-outbreak-of-an-unidentified-disease-in-rural-colombia", - "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", - "section": "Scenario 2: Outbreak of an unidentified disease in rural Colombia", - "text": "Scenario 2: Outbreak of an unidentified disease in rural Colombia\n\n\n\nScenario 2 details\n\n\n\nAnalytic pipeline for scenario 2 (analysed by group 3)\n\nData cleaning: manually, using R (no packages specified), to\n\nFix data entry issues in columns onset_date and gender\nCheck for missing data\nCheck sequence of dates: symptom onset → hospitalisation → death\n\nData anonymisation to share with partners\n\nfastlink for probabilistic matching between cases ↔︎ contacts, based on names, dates, and ages\n\nCase demographics\n\napyramid to stratify data by age, gender, and health status\n\nReproductive number calculation, by using two approaches:\n\nManually, by calculating the number of cases generated by each source case, data management through dplyr and data.table\nUsing serial interval of disease, through EpiEstim or EpiNow2\n\nSeverity of disease\n\nManual calculation of CFR and hospitalisation ratio\n\nProjection of hospital bed requirements\n\nEpiNow2 to calculate average hospitalisation duration and forecasting\n\nZoonotic transmission of disease\n\nManual inspection of cases’ occupation\nUse of IQtree and ggtree to plot phylogenetic data\n\nSuperspreading\n\nepicontacts\n\nCalculation of attack rate\n\nUnable to calculate, given the lack of seroprevalence data\n\n\n\n\n\n\n\n\n\nData analysis step\nChallenges\n\n\n\n\nData anonymisation\nDealing with typos and missing data when generating random unique identifiers\n\n\nReproduction number\nRight truncation\nUnderestimation of cases due to reporting delays\n\n\nProjection of hospital bed requirements\nIncomplete data (missing discharge date)\nUndocumented functionality in R packages used\n\n\nZoonotic transmission\nPoor documentation\nUnavailability of packages in R\nDifferentiation between zoonotic transmission and risk factors- need for population data\n\n\nAttack rate\nNot enough information provided" + "objectID": "posts/cleanepi_v1.0.0/index.html", + "href": "posts/cleanepi_v1.0.0/index.html", + "title": "cleanepi v1.0.0", + "section": "", + "text": "We are very excited to announce the release of a new cleanepi version v1.0.0. Here is an automatically generated summary of the changes in this version." }, { - "objectID": "posts/100days-workshop/index.html#scenario-3-reston-ebolavirus-in-the-philippines", - "href": "posts/100days-workshop/index.html#scenario-3-reston-ebolavirus-in-the-philippines", - "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", - "section": "Scenario 3: Reston Ebolavirus in the Philippines", - "text": "Scenario 3: Reston Ebolavirus in the Philippines\n\n\n\nScenario 3 details\n\n\n\nAnalytic pipeline for scenario 3 (analysed by group 4)\n\nData cleaning\n\nImporting data with rio, readxl, readr, or openxlsx\nRename variables with janitor\nInitial data checks with pointblank, assertr, compareDF, or skimr\nVertical data checks with matchmaker, lubridate, or parsedate\nHorizontal data checks with hmatch, assertr, or queryR\nDetect duplicates with janitor and tidyverse\nChecking for consistency with dplyr, or powerjoin\nTranslation with matchmaker\n\nDelay distributions\n\nfitdistrplus to fit parameteric distributions to epidemic data\n\nCase demographics\n\napyramid to stratify data by age, gender, and health status\nggplot2 to visualise data\n\nOutbreak description\n\nsitrep to generate reports\n\nVisualisation of geographic data\n\nsf for static maps\nleaflet for interactive maps\n\nGeneration of tables\n\ngtsummary for static tables\njanitor for interactive tables\n\nSeverity of disease\n\nEpiNow2 and survival to calculate CFR\n\nAttack rate\n\ngadm function to get population data\nepitabulate to describe data\nsf and ggplot2 to plot data\n\nForecasting\n\nEpiEstim\nEpiNow2\nbpmodels\n\nSpillover events\n\nBy cross-referencing contact data with occupations\n\nEffectiveness of contact tracing\n\nBy calculating the proportion of case follow-ups and comparing the delay of disease exposure to the follow-up delay\n\nTransmission trees\n\nepicontacts\nggplot2\n\n\n\n\n\nData analysis step\nChallenges\n\n\n\n\nDetection of outliers\nNo known tools to use\n\n\nSeverity of disease\nCensoring\n\n\nSpillover events\nMissing data" + "objectID": "posts/cleanepi_v1.0.0/index.html#first-major-release", + "href": "posts/cleanepi_v1.0.0/index.html#first-major-release", + "title": "cleanepi v1.0.0", + "section": "first major release", + "text": "first major release\nInitial CRAN submission of {cleanepi}, an R package designed for cleaning, curating, and standardizing epidemiological data.\nThis release includes the following key functionalities:\n\nRemoving irregularities: It removes duplicated and empty rows and columns, as well as columns with constant values.\nHandling missing values: It replaces missing values with the standard NA format, ensuring consistency and ease of analysis.\nEnsuring data integrity: It ensures the uniqueness of uniquely identified columns, thus maintaining data integrity and preventing duplicates.\nDate conversion: It offers functionality to convert character columns to Date format under specific conditions, enhancing data uniformity and facilitating temporal analysis. It also offers conversion of numeric values written in letters into numbers.\nStandardizing entries: It can standardize column entries into specified formats, promoting consistency across the dataset.\nTime span calculation: It calculates the time span between two elements of type Date, providing valuable demographic insights for epidemiological analysis.\nReporting cleaning outcome: It displays the report that reflects the changes made on the input data during the cleaning process, hence allowing the user to amend on the cleaning operations." }, { - "objectID": "posts/100days-workshop/index.html#scenario-4-emerging-avian-influenza-in-cambodia", - "href": "posts/100days-workshop/index.html#scenario-4-emerging-avian-influenza-in-cambodia", - "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", - "section": "Scenario 4: Emerging avian influenza in Cambodia", - "text": "Scenario 4: Emerging avian influenza in Cambodia\n\n\n\nScenario 4 details\n\n\n\nAnalytic pipeline for scenario 4 (analysed by group 5)\n\nData cleaning\n\ntidyverse\nreadxl to import data\ndplyr to remove names\nlubridate to standardise date formats\nManually scanning through excel to check for errors\n\nReproduction number\n\nEpiEstim\n\nSeverity of disease\n\nManually using R to detect missing cases\nepiR to check for data censoring\n\n\n\n\n\n\n\n\n\nData analysis step\nChallenges\n\n\n\n\nData cleaning\nNo available R packages specific for epidemic data\n\n\nReproduction number\nDifficulty finding parameter estimations in the literature\n\n\nSerial interval\nLack of a tool to check for parameter estimates\n\n\nSeverity\nMissing cases\nNeed for an R package for systematic censoring analysis" + "objectID": "posts/vaccineff_v1.0.0/index.html", + "href": "posts/vaccineff_v1.0.0/index.html", + "title": "vaccineff v1.0.0", + "section": "", + "text": "We are very excited to announce the release of a new vaccineff version v1.0.0. The package offers tools for estimating vaccine effectiveness (VE) and related metrics, with features such as:\nTwo main functions compose {vaccineff 1.0.0}:\nBoth classes support plot() and summary() methods, providing visualizations and detailed reports." }, { - "objectID": "posts/100days-workshop/index.html#scenario-5-outbreak-of-respiratory-disease-in-canada", - "href": "posts/100days-workshop/index.html#scenario-5-outbreak-of-respiratory-disease-in-canada", - "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", - "section": "Scenario 5: Outbreak of respiratory disease in Canada", - "text": "Scenario 5: Outbreak of respiratory disease in Canada\n\n\n\nScenario 5 details\n\n\n\nAnalytic pipeline for scenario 5 (analysed by group 1)\n\nDefine project structure\n\nDefining the script’s structure with cookiecutter, reportfactory, and orderly\nEnsuring reproducibility of the analysis with iRODS and Git\nWorking in a group with GitHub\n\nData cleaning\n\nImporting data with readr or rio\nChecking for errors with linelist, janitor, parsedate, matchmaker, or lubridate\njanitor to eliminate duplicates\nnaniar to check for missing data\nepitrix to anonymise data\n\nDelay distributions\n\nepitrix\nfitdistrplus to fit parameteric distributions to scenario data\n\nCase demographics\n\napyramid to stratify data by age, gender, and health status\n\nNowcasting\n\nincidence2 to visualise incidence from linelist data\nepiparameter to extract infectious disease parameter data\nEpiEstim or EpiNow2 for Rt calculation\n\nSeverity of disease\n\nCalculation of hospitalisation and mortality rates- no R package specified\n\nZoonotic transmission\n\nforecast\n\nGeneration of reports\n\nincidence for static reports\nQuarto and R markdown for dashboards\n\n\n\n\n\n\n\n\n\nData analysis step\nChallenges\n\n\n\n\nProject structure\nWorking simultaneously on the same script and managing parallel tasks\nAnticipating future incoming data in early pipeline design\n\n\nData cleaning\nLarge amount of code lines used on (reasonably) predictable cleaning (e.g. data sense checks)\nOmitting too many data entries when simply removing NA rows\nNon standardised data formats\nImplementing rapid quality check reports before analysis\n\n\nDelay distributions\nIdentifying the best method to calculate, or compare functionality of tools\nNeed to fit multiple parametric distributions and return best, and store as usable objects\n\n\nSeverity of disease\nCensoring and truncation\nUnderestimation of mild cases\nNeed database of age/gender pyramids for comparisons\n\n\nForecasts\nNeed option for fitting with range of plausible pathogen serial intervals and comparing results\nChanging reporting delays over time\nMatching inputs/outputs between packages\n\n\nZoonotic transmisison\nNeed for specific packages with clear documentation\nHow to compare simple trend-based forecasts" + "objectID": "posts/vaccineff_v1.0.0/index.html#breaking-changes", + "href": "posts/vaccineff_v1.0.0/index.html#breaking-changes", + "title": "vaccineff v1.0.0", + "section": "Breaking Changes", + "text": "Breaking Changes\n{vaccineff 1.0.0} refactors the package’s internal structure for better maintainability.\n\nestimate_vaccineff() replaces effectiveness().\n\nIt returns an object of class vaccineff.\nThe at parameter must always be provided for accurate results.\n\nplot.vaccineff_data() replaces plot_coverage().\ncohortdata has been simplified and reduced to improve examples and reduce computation time." }, { - "objectID": "posts/100days-workshop/index.html#what-next", - "href": "posts/100days-workshop/index.html#what-next", - "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", - "section": "What next?", - "text": "What next?\nScenarios developed by the 100 days workshop participants illustrate that there are many commonalities across proposed analytics pipelines, which could support interoperability across different epidemiological questions. However, there are also several remaining gaps and challenges, which creates an opportunity to build on existing work to tackle common outbreak scenarios, using the issues here as a starting point. This will also require consideration of wider interactions with existing software ecosystems and users of outbreak analytics insights. We are therefore planning to follow up this vignette with a more detailed perspective article discussing potential for broader progress in developing a ‘first 100 lines of code’." + "objectID": "posts/vaccineff_v1.0.0/index.html#quick-start", + "href": "posts/vaccineff_v1.0.0/index.html#quick-start", + "title": "vaccineff v1.0.0", + "section": "Quick start", + "text": "Quick start\nTo start with {vaccineff} 1.0.0, run the following example. This example estimates VE using the package’s example dataset, adjusted for confounders (age and sex) via the iterative matching strategy.\nThe summary() method for the vaccineff_data object provides details on cohort balance and composition before and after applying the matching strategy, as well as general cohort features.\nWhen called on the vaccineff object (ve), the summary() method returns the estimation summary, including results from the Schoenfeld test for proportional hazards.\n\nlibrary(vaccineff)\n\n# Create `vaccineff_data`\ndata(cohortdata)\nvaccineff_data <- make_vaccineff_data(\n data_set = cohortdata,\n outcome_date_col = \"death_date\",\n censoring_date_col = \"death_other_causes\",\n vacc_date_col = \"vaccine_date_2\",\n vaccinated_status = \"v\",\n unvaccinated_status = \"u\",\n immunization_delay = 15,\n end_cohort = as.Date(\"2021-12-31\"),\n match = TRUE,\n exact = \"sex\",\n nearest = c(age = 1)\n)\nsummary(vaccineff_data)\n\nCohort start: 2021-03-26\nCohort end: 2021-12-31\nThe start date of the cohort was defined as the mininimum immunization date. \n65 registers were removed with outcomes before the start date.\n\nNearest neighbors matching iteratively performed.\nNumber of iterations: 4\nBalance all:\n u v smd\nage 63.917069 62.997438 -0.08593156\nsex_F 0.520277 0.573474 0.10701746\nsex_M 0.479723 0.426526 -0.10701746\n\nBalance matched:\n u v smd\nage 63.9014396 63.7280362 -0.01587567\nsex_F 0.5206718 0.5206718 0.00000000\nsex_M 0.4793282 0.4793282 0.00000000\n\nSummary vaccination:\n u v\nAll 10973 19905\nMatched 10836 10836\nUnmatched 137 9069\n\n// tags: outcome_date_col:death_date, censoring_date_col:death_other_causes, vacc_date_col:vaccine_date_2, immunization_date_col:immunization_date, vacc_status_col:vaccine_status \n\n# Estimate VE\nve <- estimate_vaccineff(vaccineff_data, at = 180)\nsummary(ve)\n\nVaccine Effectiveness at 180 days computed as VE = 1 - HR:\n VE lower.95 upper.95\n 0.7254 0.5437 0.8347\n\nSchoenfeld test for Proportional Hazards assumption:\np-value = 0.1507" }, { - "objectID": "posts/100days-workshop/index.html#list-of-contributors", - "href": "posts/100days-workshop/index.html#list-of-contributors", - "title": "What Should the First 100 Lines of Code Written During an Epidemic Look Like?", - "section": "List of contributors", - "text": "List of contributors\n\nGroup 1: Rich Fitzjohn, Mauricio Santos Vega, Andrea Torneri, Abdoelnaser Degoot, Rolina van Gaalen, Zulma Cucunuba, Joseph Tsui, Claudine Lim, Adam Kucharski.\nGroup 2: Juan Daniel Umaña, Joel Hellewell, Anne Cori, Fanck Kalala, Amrish Baidjoe, Sara Hollis, Chaoran Chen, Pratik Gupte, Andree Valle.\nGroup 3: Mutono Nyamai, Finlay Campbell, Arminder Deol, Simone Carter, Anita Shah, Neale Batra, Issa Karambal, Danil Mihailov, Sebastian Funk.\nGroup 4: Anton Camacho, Louise Dyson, Jeremy Bingham, Simon Cauchemez, Alex Spina, Esther Van Kleef, Anna Carnegie, James Azam.\nGroup 5: Olivia Keiser, Geraldine Gomez, John Lees, Don Klinkenberg, Matthew Biggerstaff, David Santiago Quevedo, Joshua Lambert, Carmen Tamayo." + "objectID": "posts/vaccineff_v1.0.0/index.html#acknowledgements", + "href": "posts/vaccineff_v1.0.0/index.html#acknowledgements", + "title": "vaccineff v1.0.0", + "section": "Acknowledgements", + "text": "Acknowledgements\n@Bisaloo @chartgerink @davidsantiagoquevedo @ErikaCantor @fsabrilb @GeraldineGomez @jpavlich @ManuelVelarde2212 @pratikunterwegs @zmcucunuba" }, { "objectID": "learn.html", diff --git a/sitemap.xml b/sitemap.xml index 11044582..769cf8b7 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,322 +2,326 @@ https://epiverse-trace.github.io/people.html - 2024-12-04T12:41:04.561Z + 2024-12-04T16:55:58.896Z https://epiverse-trace.github.io/slides/global-health-conference-2023/index.html - 2024-12-04T12:41:04.733Z + 2024-12-04T16:55:59.072Z https://epiverse-trace.github.io/slides/harmonize-july-2023/index.html - 2024-12-04T12:41:04.789Z + 2024-12-04T16:55:59.132Z https://epiverse-trace.github.io/slides/idm-november-2024/index.html - 2024-12-04T12:41:04.809Z + 2024-12-04T16:55:59.148Z https://epiverse-trace.github.io/slides/epireview-hackathon-july-2024/index.html - 2024-12-04T12:41:04.701Z + 2024-12-04T16:55:59.036Z https://epiverse-trace.github.io/slides/who-mpox-analytics-September-2024/index.html - 2024-12-04T12:41:04.901Z + 2024-12-04T16:55:59.244Z https://epiverse-trace.github.io/slides/2024-10-GPSDD/index.html - 2024-12-04T12:41:04.657Z + 2024-12-04T16:55:58.992Z https://epiverse-trace.github.io/slides/showcase-april-2023/index.html - 2024-12-04T12:41:04.849Z + 2024-12-04T16:55:59.188Z https://epiverse-trace.github.io/slides/harmonious-ecosystem/index.html - 2024-12-04T12:41:04.749Z + 2024-12-04T16:55:59.088Z https://epiverse-trace.github.io/slides/fosdem-2024/index.html - 2024-12-04T12:41:04.713Z + 2024-12-04T16:55:59.052Z https://epiverse-trace.github.io/presentations.html - 2024-12-04T12:41:04.593Z + 2024-12-04T16:55:58.928Z https://epiverse-trace.github.io/posts/s3-generic/index.html - 2024-12-04T12:41:04.589Z + 2024-12-04T16:55:58.928Z - https://epiverse-trace.github.io/posts/cleanepi_v1.0.0/index.html - 2024-12-04T12:41:04.585Z + https://epiverse-trace.github.io/posts/100days-workshop/index.html + 2024-12-04T16:55:58.916Z - https://epiverse-trace.github.io/posts/copilot-workspace/index.html - 2024-12-04T12:41:04.585Z + https://epiverse-trace.github.io/posts/extend-dataframes/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/benchmarking_design_decisions/index.html - 2024-12-04T12:41:04.581Z + https://epiverse-trace.github.io/posts/lint-rcpp/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/simulist_v0.2.0/index.html - 2024-12-04T12:41:04.593Z + https://epiverse-trace.github.io/posts/superspreading_v0.2.0/index.html + 2024-12-04T16:55:58.928Z - https://epiverse-trace.github.io/posts/linelist_v1.0.0/index.html - 2024-12-04T12:41:04.589Z + https://epiverse-trace.github.io/posts/data-licensing.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/comm-software-devel/index.html - 2024-12-04T12:41:04.585Z + https://epiverse-trace.github.io/posts/ai-learner-review/index.html + 2024-12-04T16:55:58.916Z - https://epiverse-trace.github.io/posts/simulist_v0.1.0/index.html - 2024-12-04T12:41:04.593Z + https://epiverse-trace.github.io/posts/simulist_v0.3.0/index.html + 2024-12-04T16:55:58.928Z - https://epiverse-trace.github.io/posts/epiparameter_v0.3.0/index.html - 2024-12-04T12:41:04.589Z + https://epiverse-trace.github.io/posts/cfr_v0.1.0/index.html + 2024-12-04T16:55:58.916Z - https://epiverse-trace.github.io/posts/bpmodels_v0.3.1/index.html - 2024-12-04T12:41:04.581Z + https://epiverse-trace.github.io/posts/parent-class/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/share-cpp/index.html - 2024-12-04T12:41:04.593Z + https://epiverse-trace.github.io/posts/chjh-intro-blog/index.html + 2024-12-04T16:55:58.920Z - https://epiverse-trace.github.io/posts/renv-complications/index.html - 2024-12-04T12:41:04.589Z + https://epiverse-trace.github.io/posts/epichains_v0.1.0/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/epiparameter_v0.2.0/index.html - 2024-12-04T12:41:04.589Z + https://epiverse-trace.github.io/posts/linelist_v1.1.0/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/epidemics_v0.1.0/index.html - 2024-12-04T12:41:04.589Z + https://epiverse-trace.github.io/posts/system-dependencies/index.html + 2024-12-04T16:55:58.928Z - https://epiverse-trace.github.io/posts/epidemics_v0.4.0/index.html - 2024-12-04T12:41:04.589Z + https://epiverse-trace.github.io/posts/mpox-preparedness/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/for-vs-apply/index.html - 2024-12-04T12:41:04.589Z + https://epiverse-trace.github.io/posts/epiparameter_v0.1.0/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/statistical-correctness/index.html - 2024-12-04T12:41:04.593Z + https://epiverse-trace.github.io/posts/progressive-enhancement/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/getting-started.html - 2024-12-04T12:41:04.557Z + https://epiverse-trace.github.io/posts/readepi_v0.1.0/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/resources.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/index.html + 2024-12-04T16:55:58.896Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AaEct.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZREP.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZRD1.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZREa.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84Ab8nb.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZ1Mt.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZREC.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZRRF.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZREX.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZRE5.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84Aaqjf.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZRED.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZREB.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AaMJI.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZPXq.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZREo.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AaEih.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZRD5.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84Ab8n3.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZRHG.html + 2024-12-04T16:55:58.948Z https://epiverse-trace.github.io/resources/D_kwDOIcolP84AaEjI.html - 2024-12-04T12:41:04.613Z + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZRHG.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84Ab8n3.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZRD5.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AaEih.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZREo.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZPXq.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AaMJI.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZREB.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZRED.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84Aaqjf.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZRE5.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZREX.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZRRF.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZREC.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZ1Mt.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84Ab8nb.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZREa.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZRD1.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/resources/D_kwDOIcolP84AZREP.html - 2024-12-04T12:41:04.613Z + https://epiverse-trace.github.io/resources/D_kwDOIcolP84AaEct.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/index.html - 2024-12-04T12:41:04.561Z + https://epiverse-trace.github.io/resources.html + 2024-12-04T16:55:58.948Z - https://epiverse-trace.github.io/posts/readepi_v0.1.0/index.html - 2024-12-04T12:41:04.589Z + https://epiverse-trace.github.io/getting-started.html + 2024-12-04T16:55:58.896Z - https://epiverse-trace.github.io/posts/progressive-enhancement/index.html - 2024-12-04T12:41:04.589Z + https://epiverse-trace.github.io/posts/statistical-correctness/index.html + 2024-12-04T16:55:58.928Z - https://epiverse-trace.github.io/posts/epiparameter_v0.1.0/index.html - 2024-12-04T12:41:04.589Z + https://epiverse-trace.github.io/posts/for-vs-apply/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/mpox-preparedness/index.html - 2024-12-04T12:41:04.589Z + https://epiverse-trace.github.io/posts/epidemics_v0.4.0/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/system-dependencies/index.html - 2024-12-04T12:41:04.593Z + https://epiverse-trace.github.io/posts/epidemics_v0.1.0/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/linelist_v1.1.0/index.html - 2024-12-04T12:41:04.589Z + https://epiverse-trace.github.io/posts/epiparameter_v0.2.0/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/epichains_v0.1.0/index.html - 2024-12-04T12:41:04.585Z + https://epiverse-trace.github.io/posts/renv-complications/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/chjh-intro-blog/index.html - 2024-12-04T12:41:04.585Z + https://epiverse-trace.github.io/posts/share-cpp/index.html + 2024-12-04T16:55:58.928Z - https://epiverse-trace.github.io/posts/parent-class/index.html - 2024-12-04T12:41:04.589Z + https://epiverse-trace.github.io/posts/bpmodels_v0.3.1/index.html + 2024-12-04T16:55:58.916Z - https://epiverse-trace.github.io/posts/cfr_v0.1.0/index.html - 2024-12-04T12:41:04.581Z + https://epiverse-trace.github.io/posts/epiparameter_v0.3.0/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/simulist_v0.3.0/index.html - 2024-12-04T12:41:04.593Z + https://epiverse-trace.github.io/posts/simulist_v0.1.0/index.html + 2024-12-04T16:55:58.928Z - https://epiverse-trace.github.io/posts/ai-learner-review/index.html - 2024-12-04T12:41:04.581Z + https://epiverse-trace.github.io/posts/comm-software-devel/index.html + 2024-12-04T16:55:58.920Z - https://epiverse-trace.github.io/posts/data-licensing.html - 2024-12-04T12:41:04.585Z + https://epiverse-trace.github.io/posts/linelist_v1.0.0/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/superspreading_v0.2.0/index.html - 2024-12-04T12:41:04.593Z + https://epiverse-trace.github.io/posts/simulist_v0.2.0/index.html + 2024-12-04T16:55:58.928Z - https://epiverse-trace.github.io/posts/lint-rcpp/index.html - 2024-12-04T12:41:04.589Z + https://epiverse-trace.github.io/posts/benchmarking_design_decisions/index.html + 2024-12-04T16:55:58.916Z - https://epiverse-trace.github.io/posts/extend-dataframes/index.html - 2024-12-04T12:41:04.589Z + https://epiverse-trace.github.io/posts/copilot-workspace/index.html + 2024-12-04T16:55:58.924Z - https://epiverse-trace.github.io/posts/100days-workshop/index.html - 2024-12-04T12:41:04.581Z + https://epiverse-trace.github.io/posts/cleanepi_v1.0.0/index.html + 2024-12-04T16:55:58.920Z + + + https://epiverse-trace.github.io/posts/vaccineff_v1.0.0/index.html + 2024-12-04T16:55:58.928Z https://epiverse-trace.github.io/learn.html - 2024-12-04T12:41:04.561Z + 2024-12-04T16:55:58.896Z https://epiverse-trace.github.io/blog.html - 2024-12-04T12:41:04.557Z + 2024-12-04T16:55:58.892Z https://epiverse-trace.github.io/slides/showcase-january-2023/index.html - 2024-12-04T12:41:04.873Z + 2024-12-04T16:55:59.212Z https://epiverse-trace.github.io/slides/rslondon-southeast-july-2023/index.html - 2024-12-04T12:41:04.837Z + 2024-12-04T16:55:59.180Z https://epiverse-trace.github.io/slides/2024-4S-EASST/index.html - 2024-12-04T12:41:04.673Z + 2024-12-04T16:55:59.008Z https://epiverse-trace.github.io/slides/2023-LatinR/index.html - 2024-12-04T12:41:04.641Z + 2024-12-04T16:55:58.980Z https://epiverse-trace.github.io/slides/juniper-may-2023/index.html - 2024-12-04T12:41:04.833Z + 2024-12-04T16:55:59.172Z https://epiverse-trace.github.io/slides/interoperability-epiverse/index.html - 2024-12-04T12:41:04.809Z + 2024-12-04T16:55:59.148Z https://epiverse-trace.github.io/slides/2024-useR/index.html - 2024-12-04T12:41:04.681Z + 2024-12-04T16:55:59.016Z https://epiverse-trace.github.io/slides/showcase-november-2023/index.html - 2024-12-04T12:41:04.873Z + 2024-12-04T16:55:59.216Z https://epiverse-trace.github.io/get-involved.html - 2024-12-04T12:41:04.557Z + 2024-12-04T16:55:58.896Z