Skip to content

Commit

Permalink
Update docs in a last look
Browse files Browse the repository at this point in the history
  • Loading branch information
chartgerink committed Sep 12, 2024
1 parent e8f570c commit a552dbd
Show file tree
Hide file tree
Showing 8 changed files with 75 additions and 23 deletions.
2 changes: 1 addition & 1 deletion R/datatagr-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#'
#' The *datatagr* package provides tools to help label and validate data. The
#' 'datatagr' class adds variable level attributes to 'data.frame' columns.
#' Once tagged, these variables can be seamlessly used in downstream analyses,
#' Once labelled, these variables can be seamlessly used in downstream analyses,
#' making data pipelines more robust and reliable.
#'
#' @aliases datatagr
Expand Down
2 changes: 1 addition & 1 deletion R/drop_datatagr.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#'
#' @noRd
#'
#' @return The function returns the same object without the `datatagr` class.
#' @return The function returns the object without the `datatagr` class.
#'
drop_datatagr <- function(x, remove_labels = TRUE) {
classes <- class(x)
Expand Down
3 changes: 2 additions & 1 deletion R/has_label.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
#' requested labels
#'
#' @note Using this in a pipeline results in a 'datatagr' object, but does not
#' maintain the variable labels at this time.
#' maintain the variable labels at this time. It is primarily useful to make
#' your pipelines human readable.
#'
#' @export
#'
Expand Down
2 changes: 0 additions & 2 deletions R/restore_labels.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@
#'
#' @noRd
#'
#' @seealso [prune_labels()] for removing labels which have lost their variables
#'
#' @return The function returns a `datatagr` object with updated labels.
#'

Expand Down
3 changes: 2 additions & 1 deletion R/zzz.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
.onLoad <- function(libname, pkgname) {
lost_labels_action(Sys.getenv("DATATAGR_LOST_ACTION", "warning"),
quiet = TRUE)
quiet = TRUE
)
}
52 changes: 42 additions & 10 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,18 @@ knitr::opts_chunk$set(
)
```

# *datatagr*: Generic Data Tagging and Validating <img src="man/figures/logo.svg" align="right" width="120" alt="Logo for datatagr" />
# *datatagr*: Generic Data Labelling and Validating <img src="man/figures/logo.svg" align="right" width="120" alt="Logo for datatagr" />

<!-- badges: start -->

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/mit/)
[![R-CMD-check](https://github.com/epiverse-trace/datatagr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/epiverse-trace/datatagr/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/epiverse-trace/datatagr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/epiverse-trace/datatagr?branch=main)
[![lifecycle-concept](https://raw.githubusercontent.com/reconverse/reconverse.github.io/master/images/badge-concept.svg)](https://www.reconverse.org/lifecycle.html#concept)
[![lifecycle-concept](https://raw.githubusercontent.com/reconverse/reconverse.github.io/master/images/badge-concept.svg)](https://www.reconverse.org/lifecycle.html#experimental)

<!-- badges: end -->

**datatagr** provides functions to tag, validate, and safeguard data of any kind. datatagr is an abstraction from **linelist**, which applies these principles for epidemiological data. The original proposal for this package can be found on [the Discussion board](https://github.com/orgs/epiverse-trace/discussions/221).

> ![INFO]
> For our project management and roadmap, please [see the relevant GitHub Project](https://github.com/orgs/epiverse-trace/projects/41).
**datatagr** provides functions to label and validate data of any kind. datatagr is an abstraction from [**linelist**](https://github.com/epiverse-trace/linelist), which applies these principles to epidemiological linelist data. The original proposal for this package can be found on [the Discussion board](https://github.com/orgs/epiverse-trace/discussions/221).

## Installation

Expand All @@ -43,27 +40,62 @@ You can install the development version of datatagr from
pak::pak("epiverse-trace/datatagr")
```

## Example
## Getting started

```r
library(datatagr)

These examples illustrate some of the current functionalities
# Create a datatagr object
x <- make_datatagr(cars, speed = 'Miles per hour', dist = 'Distance in miles')

# Validate the data are of a specific type
validate_datatagr(x,
speed = 'numeric', # speed should be numeric
# type() is a helper function of related classes
dist = type('numeric') # dist should be numeric, integer
)
```

## Development

### Lifecycle

This package is currently a *concept*, as defined by the [RECON software
This package is currently a *experimental*, as defined by the [RECON software
lifecycle](https://www.reconverse.org/lifecycle.html). This means that essential
features and mechanisms are still being developed, and the package is not ready
for use outside of the development team.

### Contributions

Contributions are welcome via [pull requests](https://github.com/epiverse-trace/datatagr/pulls).
Contributions are welcome via [pull requests](https://github.com/epiverse-trace/datatagr/pulls). Anything bigger than a typo fix or a small documentation update should be discussed in an issue first. If you want to report a bug or suggest an enhancement, please open an issue. 😊

<details>
<summary>Common issues</summary>

To make it easier for us to evaluate your contribution, without common issues, please run the following commands before submitting a pull request:

```r
styler::style_pkg()
spelling::update_wordlist(pkg = ".", vignettes = TRUE)
devtools::document()

lintr::lint_package()

devtools::test()
devtools::check()
```

This will reduce the time it takes for us to review your contribution. Thank you! 😊

</details>


### Related projects

This project is related to other existing projects in R or other languages, but also differs from them in the following aspects:

- [linelist](https://github.com/epiverse-trace/linelist): A package for managing and validating linelist data - the original inspiration for datatagr.

### Code of Conduct

Please note that the datatagr project is released with a
Expand Down
2 changes: 1 addition & 1 deletion man/datatagr-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

32 changes: 26 additions & 6 deletions vignettes/design-principles.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,24 +24,44 @@ None of the sections are required, feel free to remove any sections not relevant

## Scope

< Outline the aims of the package, potentially mention some of the key exported functions, and maybe how it links with other R packages. It is also possible to mention certain aspects that fall outside of the package's scope. >
**datatagr** provides generic labelling and validation tools. In contrast to the original versions of **linelist** (`<=v1.1.4`), datatagr functions at the variable level instead of the object level.

The validation tooling is specific to type checking variables and providing feedback on potential data loss or coercion. It does not aim to do complex validations at this time.

## Naming conventions

< Description of the scheme and/or conventions used for naming functions and arguments. This can be the use of a prefix on all exported functions, a name mould ("all function are named verb_object"), or any other naming convention that is used throughout the package. >
We separate functions as much as is reasonable into their own files under `R/`. If there are tests available for a file under `R/`, it follows the convention of `test-<filename>.R` under `tests/testthat/`. Not all source code has respective tests.

We try to make function names as descriptive as possible, while keeping them short. This is to make the package easy to use and understand.

## Input/Output/Interoperability

< Describe the data structures (i.e. vectors, `<data.frames>` or classes) that are given as input to the key functions and what data structures the functions return. The design decisions around these I/O choices could also mention how it enhances interoperability with other R packages or pipelines (e.g. with `%>%`). >
Any data frame object can be passed into **datatagr**. Output from datatagr remains a data frame object, with an additional datatagr class attribute. This means it remains interoperable with all the regular data frame operations one may attempt to do.

**datatagr** is interoperable with pipes (that is, `|>` or `%>%`). This allows for easy chaining of functions. Note that there are no guarantees that label attributes are preserved when piping or wrangling in another way. For example, **dplyr** drops variable level attributes when using `dplyr::mutate()`.

## Design decisions

< A list of bullet points each explaining a design decision and its reasoning. >
* **Generic**: The package is designed to be a generic tool for labelling and validating data. This is to ensure that the package can be used in a wide range of contexts and is not limited to a specific use case. Any specific use cases should be implemented in separate packages.
* **Local**: We keep functions as local as possible. This means operations should be as precise as is feasible, to be non-destructive and ensure changes on one variable do not unexpectedly affect another. This helps ensure the package is predictable and easy to use + maintain.
* **Minimize number of functions**: We aim to keep the number of functions in the package to a minimum. This helps usability and maintainability.
* **Base R**: We aim to use base R functions where possible. This is to ensure that the package is lightweight and does not have many dependencies. This is for example why we do not use **labelled** as the labelling package.

If you feel like we did not uphold one of these design decisions, please let us know 😊

### Quirks

Any package development has quirks. We outline quirks we are aware of here:

* Currently, emptying labels leads to setting them to `""` (empty character strings). Preferably we would end up setting them to `NULL` in the end.

## Dependencies

< A list of dependencies used by the package with some explanation as to why they are required. Not all dependencies need to be explained and it is best to explain the key dependencies. It can be used to give context to why certain dependencies are used (e.g. "This package is expected to be used in tidyverse pipelines and as such, we consider these tidyverse packages good dependencies that will already be installed on a user's computer."). This section can also mention dependencies that are planned to be removed or added in future development. Suggested dependencies do not need to be explained unless they are unusual and may surprise developers with their inclusion. >
* **checkmate** - provide assertions for function arguments
* **lifecycle** - help manage function lifecycle
* **rlang** - `...` to list parsing
* **tidyselect** - ensure we can use pipes in `has_label()`

## Development journey

< If the package has undergone any large refactoring this section can be used to explain the changes. >
The **datatagr** package is a major refactor of **linelist** `v1.1.4`. The refactor was necessary to make the package more generic and to make the codebase more maintainable. The refactor was completed in a series of steps documented in [#37](https://github.com/epiverse-trace/datatagr/pull/37).

0 comments on commit a552dbd

Please sign in to comment.