-
Notifications
You must be signed in to change notification settings - Fork 14
/
io32-data-from-redcap.Rmd
202 lines (141 loc) · 6.67 KB
/
io32-data-from-redcap.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
---
title: "Extracting Clinical Study Data from REDCap"
author: "Peter D.R. Higgins"
output:
html_document:
df_print: paged
editor_options:
markdown:
wrap: sentence
---
# Extracting Clinical Study Data from REDCap
In your investigator-initiated clinical research, you will often collect data in
a HIPAA-compatible database like REDCap, which will protect Protected Health
Information (PHI) securely on the web. However, the usual downloads from REDCap
have to be done manually and are a bit clunky. If you take one time measurements
(baseline, outcome), as well as repeated measurements over time (e.g. labs, BMI,
depression score at each visit), the data will be downloaded as one giant rectangle
with a lot of missing data, as well as a lot of repeated data.
In this chapter, we will show you how to extract data from REDCap using the
`{REDCapTidier}` package, and then how to wrangle the data into a more usable
format. This will enable you to monitor your ongoing clinical research in near
real-time, by allowing daily or weekly updates of summary webpages that you can
review and share with collaborators. This will allow you to identify problems
early on, and correct course if needed. Inadequate recruitment and retention
often cause clinical studies to fail, but are often recognized too late, when
limited funds are running out. Closely monitoring your trial will help you identify
problems with data collection, early participant dropout, or other issues that
can be corrected early on.
## Goals for this Chapter
In this chapter, we will focus on extracting and cleaning data from a REDCap
database from an ongoing clinical study. We will use the `{REDCapTidier}` package
and a training database created with fake data for this purpose.
In applying this in practice, we will assume that you have:
1. Ovtained IRB permission to run a clinical study
2. Created a REDCap database for your study
We will cover the following topics:
1. Obtaining a REDCap API token
2. Using the REDCapTidier package to extract data from REDCap
3. Labeling the data fields
4. Cleaning the data with the `{janitor}` package
## Packages Needed for this Chapter:
- `{tidyverse}`
- `{REDCapTidieR}`
- `{janitor}`
## Pathway for this Chapter
```{r setup, include = FALSE}
library(tidyverse)
library(webexercises)
library(REDCapTidieR)
library(janitor)
```
## Obtaining a REDCap API token
Let's start by looking at your REDCap database.
APi stands for Application Programming Interface. Many databases have a web API
to allow people to use programs to extract data from the database. This is
a secure, permission-based way to extract data from a database.
First we want to establish that you have API token access to your REDCap database.
The API token is a long string of characters that you can generate in your REDCap
project. The person who built the database could have given you API access, or
may not have. You can tell whether you have API Access by looking at the home
page of your REDCap database. Look for an API button in the left sidebar. If you
don't see it, you don't have API access. If you do see it, click on it and see
if you have a token. If you don't, you can generate one by clicking on the
"Generate Token" button.
- images here
Look for API
if no permission, ask the db creator for API permission
once you have API permission, contact your local REDCap administrator to request
an API token. This is a long string of characters that you will use to access your
REDCap database.
## Finding the web URL of your redcap production Database
## Setting up a secure firewalled server as a location for your project and data
## Using the REDCapTidier package to extract data from REDCap
## Labeling the data fields
## Cleaning the data with the `{janitor}` package
### Your Turn
In the last code block, we saved the table as the object `tbl`. Take this `tbl`
object and
- fix up the labels for `hgb`, `alk.phos`, and `ast` to be more publication-ready.
- add a caption to the table that says "Baseline Characteristics of Participants in the Mock Study by Treatment Arm"
- add a footnote to the table that says "Note: Missing values are indicated as 'Missing' in the table."
- improve the value labels for 'Hawaii/Pacific', and 'Native-Am/Alaska' to be "Native Hawaiian or other Pacific Islander" and "American Indian or Alaska Native" respectively.
- change the number of digits in the table to 2.
- change the stats to show the mean and standard deviation for continuous variables, and the count and percent for categorical variables.
- convert the table to a gt table and style it with the `gt` package. Add some color.
The code block below starts with tbl, unmodified.
Add a pipe and start fixing it up, with the goals above. Refer to the `{gtsummary}` and `{gt}` documentation for help. Click on the links below as needed.
- gtsummary: [tbl_summary()](https://www.danieldsjoberg.com/gtsummary/reference/tbl_summary.html)
- gt: [gt()](https://gt.rstudio.com/reference/gt.html)
```{r your-turn}
tbl
```
## Try this with a new dataset
Now we will use the `trial` dataset from the {gtsummary} package to create a Table 1.
The `trial` dataset is a simulated dataset of a clinical trial with 200 observations and 6 variables.
Run the code block below. Which are baseline variables for Table 1, and which are outcome variables?
Use your R skills to:
- select the baseline variables
- create a Table 1, divided by treatment (`trt`), and
- set the missing_text to "Missing".
```{r trial-dim}
trial
```
`r hide(button_text = "Show code Solution")`
```{r}
trial |>
select(age, marker, stage, grade, trt) |>
tbl_summary(by = trt, missing_text = "Missing")
```
`r unhide()`
### Your Turn
`r hide(button_text = "Show code Solution")`
```{r}
strep_tb |>
select(gender, starts_with("baseline"), arm) |>
tbl_summary(by = arm, missing_text = "Missing",
label = list(
gender ~ "Gender",
baseline_condition ~ "Baseline Condition",
baseline_temp ~ "Baseline Temperature",
baseline_esr ~ "Baseline ESR",
baseline_cavitation ~ "Baseline Cavitation"
)) |>
add_n() |>
add_overall() |>
bold_labels() |>
modify_header(update = list(
label ~ "**Participant<br>Characteristic**",
stat_0 ~ "**Overall**<br>N = 107",
stat_1 ~ "**Control**<br>N = 52",
stat_2 ~ "**Streptomycin**<br>N = 55"
)) |>
modify_spanning_header(c("stat_1", "stat_2") ~ "**Treatment Arm**")
```
`r unhide()`
## Summary
You now know how to
1. Identify your REDCap URL and obtain an API token for your database
2. Us the REDCapTidier package to extract data from REDCap
3. Label the data fields
4. Clean the data with the `{janitor}` package