From 546604c43a1000277e2a39c3f7520dab1794c136 Mon Sep 17 00:00:00 2001 From: Chris Hartgerink Date: Thu, 12 Sep 2024 10:11:08 +0200 Subject: [PATCH] Update docs in a last look --- R/datatagr-package.R | 2 +- R/drop_datatagr.R | 2 +- R/has_label.R | 3 +- R/restore_labels.R | 2 -- R/zzz.R | 3 +- README.Rmd | 52 ++++++++++++++++++++++++++------- man/datatagr-package.Rd | 2 +- vignettes/design-principles.Rmd | 32 ++++++++++++++++---- 8 files changed, 75 insertions(+), 23 deletions(-) diff --git a/R/datatagr-package.R b/R/datatagr-package.R index f97a463..66169aa 100644 --- a/R/datatagr-package.R +++ b/R/datatagr-package.R @@ -2,7 +2,7 @@ #' #' The *datatagr* package provides tools to help label and validate data. The #' 'datatagr' class adds variable level attributes to 'data.frame' columns. -#' Once tagged, these variables can be seamlessly used in downstream analyses, +#' Once labelled, these variables can be seamlessly used in downstream analyses, #' making data pipelines more robust and reliable. #' #' @aliases datatagr diff --git a/R/drop_datatagr.R b/R/drop_datatagr.R index 8a8a89c..96e22bb 100644 --- a/R/drop_datatagr.R +++ b/R/drop_datatagr.R @@ -10,7 +10,7 @@ #' #' @noRd #' -#' @return The function returns the same object without the `datatagr` class. +#' @return The function returns the object without the `datatagr` class. #' drop_datatagr <- function(x, remove_labels = TRUE) { classes <- class(x) diff --git a/R/has_label.R b/R/has_label.R index a87e89a..7b35ba8 100644 --- a/R/has_label.R +++ b/R/has_label.R @@ -6,7 +6,8 @@ #' requested labels #' #' @note Using this in a pipeline results in a 'datatagr' object, but does not -#' maintain the variable labels at this time. +#' maintain the variable labels at this time. It is primarily useful to make +#' your pipelines human readable. #' #' @export #' diff --git a/R/restore_labels.R b/R/restore_labels.R index 51a9beb..2c98e61 100644 --- a/R/restore_labels.R +++ b/R/restore_labels.R @@ -17,8 +17,6 @@ #' #' @noRd #' -#' @seealso [prune_labels()] for removing labels which have lost their variables -#' #' @return The function returns a `datatagr` object with updated labels. #' diff --git a/R/zzz.R b/R/zzz.R index 53b1368..a8d67cb 100644 --- a/R/zzz.R +++ b/R/zzz.R @@ -1,4 +1,5 @@ .onLoad <- function(libname, pkgname) { lost_labels_action(Sys.getenv("DATATAGR_LOST_ACTION", "warning"), - quiet = TRUE) + quiet = TRUE + ) } diff --git a/README.Rmd b/README.Rmd index f7e0f85..e1134e7 100644 --- a/README.Rmd +++ b/README.Rmd @@ -17,21 +17,18 @@ knitr::opts_chunk$set( ) ``` -# *datatagr*: Generic Data Tagging and Validating Logo for datatagr +# *datatagr*: Generic Data Labelling and Validating Logo for datatagr [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/mit/) [![R-CMD-check](https://github.com/epiverse-trace/datatagr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/epiverse-trace/datatagr/actions/workflows/R-CMD-check.yaml) [![Codecov test coverage](https://codecov.io/gh/epiverse-trace/datatagr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/epiverse-trace/datatagr?branch=main) -[![lifecycle-concept](https://raw.githubusercontent.com/reconverse/reconverse.github.io/master/images/badge-concept.svg)](https://www.reconverse.org/lifecycle.html#concept) +[![lifecycle-concept](https://raw.githubusercontent.com/reconverse/reconverse.github.io/master/images/badge-concept.svg)](https://www.reconverse.org/lifecycle.html#experimental) -**datatagr** provides functions to tag, validate, and safeguard data of any kind. datatagr is an abstraction from **linelist**, which applies these principles for epidemiological data. The original proposal for this package can be found on [the Discussion board](https://github.com/orgs/epiverse-trace/discussions/221). - -> ![INFO] -> For our project management and roadmap, please [see the relevant GitHub Project](https://github.com/orgs/epiverse-trace/projects/41). +**datatagr** provides functions to label and validate data of any kind. datatagr is an abstraction from [**linelist**](https://github.com/epiverse-trace/linelist), which applies these principles to epidemiological linelist data. The original proposal for this package can be found on [the Discussion board](https://github.com/orgs/epiverse-trace/discussions/221). ## Installation @@ -43,27 +40,62 @@ You can install the development version of datatagr from pak::pak("epiverse-trace/datatagr") ``` -## Example +## Getting started + +```r +library(datatagr) -These examples illustrate some of the current functionalities +# Create a datatagr object +x <- make_datatagr(cars, speed = 'Miles per hour', dist = 'Distance in miles') + +# Validate the data are of a specific type +validate_datatagr(x, + speed = 'numeric', # speed should be numeric + # type() is a helper function of related classes + dist = type('numeric') # dist should be numeric, integer +) +``` ## Development ### Lifecycle -This package is currently a *concept*, as defined by the [RECON software +This package is currently a *experimental*, as defined by the [RECON software lifecycle](https://www.reconverse.org/lifecycle.html). This means that essential features and mechanisms are still being developed, and the package is not ready for use outside of the development team. ### Contributions -Contributions are welcome via [pull requests](https://github.com/epiverse-trace/datatagr/pulls). +Contributions are welcome via [pull requests](https://github.com/epiverse-trace/datatagr/pulls). Anything bigger than a typo fix or a small documentation update should be discussed in an issue first. If you want to report a bug or suggest an enhancement, please open an issue. 😊 + +
+ Common issues + +To make it easier for us to evaluate your contribution, without common issues, please run the following commands before submitting a pull request: + +```r +styler::style_pkg() +spelling::update_wordlist(pkg = ".", vignettes = TRUE) +devtools::document() + +lintr::lint_package() + +devtools::test() +devtools::check() +``` + +This will reduce the time it takes for us to review your contribution. Thank you! 😊 + +
+ ### Related projects This project is related to other existing projects in R or other languages, but also differs from them in the following aspects: +- [linelist](https://github.com/epiverse-trace/linelist): A package for managing and validating linelist data - the original inspiration for datatagr. + ### Code of Conduct Please note that the datatagr project is released with a diff --git a/man/datatagr-package.Rd b/man/datatagr-package.Rd index e15c41e..fe4a183 100644 --- a/man/datatagr-package.Rd +++ b/man/datatagr-package.Rd @@ -8,7 +8,7 @@ \description{ The \emph{datatagr} package provides tools to help label and validate data. The 'datatagr' class adds variable level attributes to 'data.frame' columns. -Once tagged, these variables can be seamlessly used in downstream analyses, +Once labelled, these variables can be seamlessly used in downstream analyses, making data pipelines more robust and reliable. } \note{ diff --git a/vignettes/design-principles.Rmd b/vignettes/design-principles.Rmd index 9298453..9ba3684 100644 --- a/vignettes/design-principles.Rmd +++ b/vignettes/design-principles.Rmd @@ -24,24 +24,44 @@ None of the sections are required, feel free to remove any sections not relevant ## Scope -< Outline the aims of the package, potentially mention some of the key exported functions, and maybe how it links with other R packages. It is also possible to mention certain aspects that fall outside of the package's scope. > +**datatagr** provides generic labelling and validation tools. In contrast to the original versions of **linelist** (`<=v1.1.4`), datatagr functions at the variable level instead of the object level. + +The validation tooling is specific to type checking variables and providing feedback on potential data loss or coercion. It does not aim to do complex validations at this time. ## Naming conventions -< Description of the scheme and/or conventions used for naming functions and arguments. This can be the use of a prefix on all exported functions, a name mould ("all function are named verb_object"), or any other naming convention that is used throughout the package. > +We separate functions as much as is reasonable into their own files under `R/`. If there are tests available for a file under `R/`, it follows the convention of `test-.R` under `tests/testthat/`. Not all source code has respective tests. + +We try to make function names as descriptive as possible, while keeping them short. This is to make the package easy to use and understand. ## Input/Output/Interoperability -< Describe the data structures (i.e. vectors, `` or classes) that are given as input to the key functions and what data structures the functions return. The design decisions around these I/O choices could also mention how it enhances interoperability with other R packages or pipelines (e.g. with `%>%`). > +Any data frame object can be passed into **datatagr**. Output from datatagr remains a data frame object, with an additional datatagr class attribute. This means it remains interoperable with all the regular data frame operations one may attempt to do. + +**datatagr** is interoperable with pipes (that is, `|>` or `%>%`). This allows for easy chaining of functions. Note that there are no guarantees that label attributes are preserved when piping or wrangling in another way. For example, **dplyr** drops variable level attributes when using `dplyr::mutate()`. ## Design decisions -< A list of bullet points each explaining a design decision and its reasoning. > +* **Generic**: The package is designed to be a generic tool for labelling and validating data. This is to ensure that the package can be used in a wide range of contexts and is not limited to a specific use case. Any specific use cases should be implemented in separate packages. +* **Local**: We keep functions as local as possible. This means operations should be as precise as is feasible, to be non-destructive and ensure changes on one variable do not unexpectedly affect another. This helps ensure the package is predictable and easy to use + maintain. +* **Minimize number of functions**: We aim to keep the number of functions in the package to a minimum. This helps usability and maintainability. +* **Base R**: We aim to use base R functions where possible. This is to ensure that the package is lightweight and does not have many dependencies. This is for example why we do not use **labelled** as the labelling package. + +If you feel like we did not uphold one of these design decisions, please let us know 😊 + +### Quirks + +Any package development has quirks. We outline quirks we are aware of here: + +* Currently, emptying labels leads to setting them to `""` (empty character strings). Preferably we would end up setting them to `NULL` in the end. ## Dependencies -< A list of dependencies used by the package with some explanation as to why they are required. Not all dependencies need to be explained and it is best to explain the key dependencies. It can be used to give context to why certain dependencies are used (e.g. "This package is expected to be used in tidyverse pipelines and as such, we consider these tidyverse packages good dependencies that will already be installed on a user's computer."). This section can also mention dependencies that are planned to be removed or added in future development. Suggested dependencies do not need to be explained unless they are unusual and may surprise developers with their inclusion. > +* **checkmate** - provide assertions for function arguments +* **lifecycle** - help manage function lifecycle +* **rlang** - `...` to list parsing +* **tidyselect** - ensure we can use pipes in `has_label()` ## Development journey -< If the package has undergone any large refactoring this section can be used to explain the changes. > +The **datatagr** package is a major refactor of **linelist** `v1.1.4`. The refactor was necessary to make the package more generic and to make the codebase more maintainable. The refactor was completed in a series of steps documented in [#37](https://github.com/epiverse-trace/datatagr/pull/37).