Skip to content

Latest commit

 

History

History
128 lines (100 loc) · 3.47 KB

README.md

File metadata and controls

128 lines (100 loc) · 3.47 KB

kmdata

This R[1] package contains a database of 304 reconstructed, patient-level clinical trial datasets on multiple survival endpoints, including overall and progression-free survival.

The data have been extracted from Kaplan-Meier (KM) curves reported in 153 oncology Phase III clinical trial publications.

Studies have been identified through a PubMed search of clinical trials in breast, lung cancer, prostate, and colorectal cancer published between 2014 and 2016.

For each trial that met the our search criteria, we extracted and curated study-level information.

All reported KM survival curves were digitized with the software DigitizeIt.

The digitized KM survival curves were used to estimate (possibly censored) patient-level event times using the Guyot-algorithm[2].

References

[1] R-Core-Team. A Language and Environment for Statistical Computing. R Found Stat Comput. 2018;2: https://www.r-project.org.

[2] Guyot P, Ades AE, Ouwens MJNM, Welton NJ. Enhanced secondary analysis of survival data: Reconstructing the data from published Kaplan-Meier survival curves. BMC Med Res Methodol. 2012;12. doi:10.1186/1471-2288-12-9


Installation

# install.packages('devtools')
devtools::install_github('raredd/kmdata', build_vignettes = TRUE)

Getting started

View a list of the data available in the kmdata package:

library('kmdata')
data(package = 'kmdata')

Or generate individual patient data (IPD) from digitized survival curve data using the method described by Guyot:

library('survival')
sf <- survfit(Surv(time, status) ~ 1, cancer)
aa <- summary(sf, times = 0:8 * 100)

pt <- ipd(sf$time, sf$surv, aa$time, aa$n.risk)
pf <- survfit(Surv(time, event) ~ 1, pt)

Compare:

sf
# Call: survfit(formula = Surv(time, status) ~ 1, data = cancer)
# 
#       n  events  median 0.95LCL 0.95UCL 
#     228     165     310     285     363 
pf
# Call: survfit(formula = Surv(time, event) ~ 1, data = pt)
# 
#       n  events  median 0.95LCL 0.95UCL 
#     228     163     310     285     363 
summary(sf, times = 0:5 * 100)
# Call: survfit(formula = Surv(time, status) ~ 1, data = cancer)
# 
# time n.risk n.event survival std.err lower 95% CI upper 95% CI
#    0    228       0   1.0000  0.0000       1.0000        1.000
#  100    196      31   0.8640  0.0227       0.8206        0.910
#  200    144      41   0.6803  0.0311       0.6219        0.744
#  300     92      29   0.5306  0.0346       0.4669        0.603
#  400     57      25   0.3768  0.0358       0.3128        0.454
#  500     41      12   0.2933  0.0351       0.2320        0.371
summary(pf, times = 0:5 * 100)
# Call: survfit(formula = Surv(time, event) ~ 1, data = pt)
# 
# time n.risk n.event survival std.err lower 95% CI upper 95% CI
#    0    228       0    1.000  0.0000       1.0000        1.000
#  100    196      31    0.864  0.0228       0.8201        0.909
#  200    144      40    0.681  0.0314       0.6221        0.745
#  300     92      29    0.531  0.0348       0.4672        0.604
#  400     57      25    0.379  0.0358       0.3145        0.456
#  500     41      13    0.290  0.0349       0.2295        0.367

Additional resources

See the kmdata intro vignette or more information about generating IPD from digitized survival curves.