Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joss paper #166

Merged
merged 18 commits into from
Dec 5, 2023
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
^cran-comments\.md$
README.html$

paper/
pkgdown/
utility/
lastMiKTeXException/
Expand Down
Binary file added paper/images/REDCapTidieR JOSS.png
rsh52 marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
79 changes: 79 additions & 0 deletions paper/paper.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
@article{Harris2019,
title = {The REDCap consortium: Building an international community of software platform partners},
journal = {Journal of Biomedical Informatics},
volume = {95},
pages = {103208},
year = {2019},
issn = {1532-0464},
doi = {https://doi.org/10.1016/j.jbi.2019.103208},
url = {https://www.sciencedirect.com/science/article/pii/S1532046419301261},
author = {Paul A. Harris and Robert Taylor and Brenda L. Minor and Veida Elliott and Michelle Fernandez and Lindsay O'Neal and Laura McLeod and Giovanni Delacqua and Francesco Delacqua and Jacqueline Kirby and Stephany N. Duda},
keywords = {Medical informatics, Electronic data capture, Clinical research, Translational research},
abstract = {The Research Electronic Data Capture (REDCap) data management platform was developed in 2004 to address an institutional need at Vanderbilt University, then shared with a limited number of adopting sites beginning in 2006. Given bi-directional benefit in early sharing experiments, we created a broader consortium sharing and support model for any academic, non-profit, or government partner wishing to adopt the software. Our sharing framework and consortium-based support model have evolved over time along with the size of the consortium (currently more than 3200 REDCap partners across 128 countries). While the “REDCap Consortium” model represents only one example of how to build and disseminate a software platform, lessons learned from our approach may assist other research institutions seeking to build and disseminate innovative technologies.}
}

@article{Harris2009,
title = {Research electronic data capture (REDCap)—A metadata-driven methodology and workflow process for providing translational research informatics support},
journal = {Journal of Biomedical Informatics},
volume = {42},
number = {2},
pages = {377-381},
year = {2009},
issn = {1532-0464},
doi = {https://doi.org/10.1016/j.jbi.2008.08.010},
url = {https://www.sciencedirect.com/science/article/pii/S1532046408001226},
author = {Paul A. Harris and Robert Taylor and Robert Thielke and Jonathon Payne and Nathaniel Gonzalez and Jose G. Conde},
keywords = {Medical informatics, Electronic data capture, Clinical research, Translational research},
abstract = {Research electronic data capture (REDCap) is a novel workflow methodology and software solution designed for rapid development and deployment of electronic data capture tools to support clinical and translational research. We present: (1) a brief description of the REDCap metadata-driven software toolset; (2) detail concerning the capture and use of study-related metadata from scientific research teams; (3) measures of impact for REDCap; (4) details concerning a consortium network of domestic and international institutions collaborating on the project; and (5) strengths and limitations of the REDCap system. REDCap is currently supporting 286 translational research projects in a growing collaborative network including 27 active partner institutions.}
}

@article{Wickham2014,
title={Tidy Data},
volume={59},
url={https://www.jstatsoft.org/index.php/jss/article/view/v059i10},
doi={10.18637/jss.v059.i10},
abstract={A huge amount of effort is spent cleaning data to get it ready for analysis, but there has been little research on how to make data cleaning as easy and effective as possible. This paper tackles a small, but important, component of data cleaning: data tidying. Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table. This framework makes it easy to tidy messy datasets because only a small set of tools are needed to deal with a wide range of un-tidy datasets. This structure also makes it easier to develop tidy tools for data analysis, tools that both input and output tidy datasets. The advantages of a consistent data structure and matching tools are demonstrated with a case study free from mundane data manipulation chores.},
number={10},
journal={Journal of Statistical Software},
author={Wickham, Hadley},
year={2014},
pages={1–23}
}

@Manual{r_citation,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2020},
url = {https://www.R-project.org/},
}

@Manual{redcapr_cit,
title = {REDCapR: Interaction Between R and REDCap},
author = {{Beasley, Will}},
organization = {Biomedical and Behavioral Methodology Core (University of Oklahoma Health Sciences Center)},
address = {Oklahoma City, Oklahoma},
year = {2022},
url = {https://cran.r-project.org/web/packages/REDCapR/index.html},
}

@Manual{redcapapi_cit,
title = {redcapAPI: Interface to 'REDCap'},
author = {{Garbett, Shawn}},
organization = {Vanderbilt Biostatistics},
address = {Nashville, Tennessee},
year = {2023},
url = {https://cran.r-project.org/web/packages/redcapAPI/index.html},
}

@Article{tidyverse_cit,
title = {Welcome to the {tidyverse}},
author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani},
year = {2019},
journal = {Journal of Open Source Software},
volume = {4},
number = {43},
pages = {1686},
doi = {10.21105/joss.01686},
}
103 changes: 103 additions & 0 deletions paper/paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
title: 'REDCapTidieR: Extracting complex REDCap databases into tidy tables'
tags:
- R
- REDCap
- data management
authors:
- name: Richard Hanna
orcid: 0009-0005-6496-8154
equal-contrib: true
affiliation: "1"
- name: Ezra Porter
orcid: 0000-0002-4690-8343
equal-contrib: true
affiliation: "1"
- name: Stephany Romero
equal-contrib: true
affiliation: "1"
- name: Paul Wildenhain
equal-contrib: true
affiliation: "6"
- name: Wiliam Beasley
orcid: 0000-0002-5613-5006
equal-contrib: true
affiliation: "7"
- name: Stephan Kadauke
orcid: 0000-0003-2996-8034
equal-contrib: true
affiliation: "1, 2, 3, 4, 5"
affiliations:
- name: Division of Oncology, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
index: 1
- name: Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
index: 2
- name: Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania
index: 3
- name: Division of Transfusion Medicine, Children's Hospital of Philadelphia, Pennsylvania
index: 4
- name: Division of Pathology Informatics, Children's Hospital of Philadelphia, Pennsylvania
index: 5
- name: Division of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
index: 6
- name: Department of Pediatrics, The University of Oklahoma Health Sciences Center, College of Medicine, Oklahoma City, Oklahoma, USA
- index: 7
date: XX November 2023
bibliography: paper.bib
---

# Summary

Capturing and storing electronic data is integral in the research world, yet often becomes a burden to the researchers themselves. [REDCap](https://www.project-redcap.org/) [@Harris2009; @Harris2019] alleviates this problem by offering a secure web application that lets users build databases and surveys with a robust front-end interface that can support data of any type, including data requiring compliance with standards for protected information.

For many researchers who use REDCap, the R language [@r_citation] is a powerful tool for extracting and analyzing their data. To take advantage of REDCap's REST API, the [`REDCapR`](https://cran.r-project.org/web/packages/REDCapR/index.html) [@redcapr_cit] and [`redcapAPI`](https://cran.r-project.org/web/packages/redcapAPI/index.html) [@redcapapi_cit] packages allow R users to extract data directly into their programming environment. The default extraction structure for a given REDCap database is referred to as the "block matrix," and is a singular, unwieldy, and "untidy" data table. The concept of "[tidy data](https://www.jstatsoft.org/article/view/v059i10)" [@Wickham2014] describes a framework for standard mapping and structuring of data where each variable forms a column, each observation forms a row, and each type of observational unit forms a table. Fundamentally, the block matrix breaks these tidy principles by obscuring the primary keys that identify individual records, leaving analysts with the arduous task of reformatting the matrix for usability.

To address these challenges, we developed `REDCapTidieR` as an open source R package that transforms the standard REDCap output into a format that adheres to tidy data principles. `REDCapTidieR` has the potential to save organizations and research staff immeasurable amounts of time, allowing them to quickly query their data without the need for intricate data parsing processes.

# Statement of Need

As of 2023, the REDCap Consortium boasts nearly 3 million users across over 150 countries. REDCap databases exhibit significant variation in complexity, ranging from simple tables with easily identifiable records to more challenging scenarios where pinpointing a unique identifier is harder. This complexity often arises in databases that make use of "repeating instruments" and "repeating events." For an in-depth exploration of this concept, refer to the [`REDCapTidieR` documentation](https://chop-cgtinformatics.github.io/REDCapTidieR/articles/diving_deeper.html#longitudinal-redcap-projects). Fundamentally, repeating events and instruments support longitudinal studies, where subjects may have distinct timelines with varying levels of record granularity. This is where the flattening of the database into the block matrix becomes a pain point for analysts.

While there are a few existing REDCap tools for R documented by [`REDCap-tools`](https://redcap-tools.github.io/projects/), `REDCapTidieR` occupies a unique space by providing analysts with an opinionated framework that quickly prepares them for data analysis. Although some of the aforementioned tools also offer functions for data processing, such as the [`tidyREDCap`](https://raymondbalise.github.io/tidyREDCap/) and [`REDCapDM`](https://ubidi.github.io/REDCapDM/index.html) packages, `REDCapTidieR` is unique in how it restructures the block matrix into a format that is easily interpretable within the user's programmatic environment. Of the tools available, `REDCapTidieR` is the only one that fundamentally restructures the block matrix in its entirety.

| Package | Data Export Support | Data Import Support | Data Manipulation | Data Tidying |
|-------------|--------------------|--------------------|------------------|--------------|
| redcapAPI | x | x | | |
| REDCapR | x | x | | |
| tidyREDCap | x | | x | |
| REDCapDM | x | | x | |
| REDCapTidieR| x | | x | x |

# Design

Transformation of the block matrix into a friendlier structure is carried out by `REDCapTidieR` through a series of complex operations that result in the "supertibble." The supertibble, named after the [`tibble` package](https://tibble.tidyverse.org/), is presented as a table where each row corresponds to a REDCap instrument and each column corresponds to either that instrument's post-processed data (a "data tibble"), metadata, or useful information about that instrument itself.

Unlike the block matrix, which combines all columns for record identification into one table, `REDCapTidieR` separates instruments so that only the variables necessary for identification of a record within the instrument are included in each data tibble. Below we provide a sample model that compares the standard output from a REDCap database with non-repeating and repeating instruments to one post-processed through `REDCapTidieR`.

![Conceptual Model](/paper/images/REDCapTidieR%20JOSS.png)
Figure 1: Comparative model showing REDCap API export formats between the default behavior and `REDCapTidieR`.

In this example, the supertibble displays three REDCap database instruments, with one repeating and two non-repeating. Below, one of each of these instrument types is expanded to show how `REDCapTidieR` separates these instruments into their own tabular list elements structured with only the identifiers necessary to pinpoint a specific record. This format makes tables easily joinable by analysts for whatever operations they may need later in their work.

# Installation

`REDCapTidieR` is available on [GitHub](https://github.com/CHOP-CGTInformatics/REDCapTidieR) and [CRAN](https://cran.r-project.org/web/packages/REDCapTidieR/index.html) and has been tested for functionality on all major operating systems.

# Acknowledgements

`REDCapTidieR` is made possible in large part thanks to the `REDCapR` and `tidyverse` [@tidyverse_cit] packages.

The authors would also like to give special thanks to Will Beasley, Paul Wildenhain, and Jan Marvin for their feedback and support in development.

# Conflict of interest

This package was developed by the [Children’s Hospital of
Philadelphia](https://www.chop.edu) Cell and Gene Therapy Informatics
Team to support the needs of the [Cellular Therapy and Transplant
Section](https://www.chop.edu/centers-programs/cellular-therapy-and-transplant-section).
The development was funded using the following sources:

- *Stephan Kadauke Start-up funds.* Stephan Kadauke, PI, CHOP, 2018-2024

- *CHOP-based GMP cell manufacturing (MFG) for CAR T clinical trials*.
Stephan Grupp, PI; Stephan Kadauke, co-PI, CHOP, 2021-2023