This is the current version of the SuperLearner R package (version 2.*).
Features
- Automatic optimal predictor ensembling via cross-validation.
- Includes dozens of algorithms including Random Forest, GBM, XGBoost, BART, Elastic Net, and Neural Networks.
- Integrates with caret to support even more algorithms.
- Includes framework to quickly add custom algorithms to the ensemble
- Visualize the performance of each algorithm using built-in plotting.
- Easily incorporate multiple hyperparameter configurations for each algorithm into the ensemble.
- Add new algorithms or change the default parameters for existing ones.
- Screen variables (feature selection) based on univariate association, Random Forest, Elastic Net, et al. or a custom screening algorithms.
- Multi-core and multi-node parallelization for scalability.
- External cross-validation to estimate the performance of the ensembling predictor.
- Ensemble can optimize for any target metric: mean-squared error, AUC, log likelihood, etc.
- Includes framework to provide custom loss functions and stacking algorithms
if (!require("devtools")) install.packages("devtools")
devtools::install_github("ecpolley/SuperLearner")
install.packages("SuperLearner")
Polley EC, van der Laan MJ (2010) Super Learner in Prediction. U.C. Berkeley Division of Biostatistics Working Paper Series. Paper 226. http://biostats.bepress.com/ucbbiostat/paper266/
van der Laan, M. J., Polley, E. C. and Hubbard, A. E. (2007) Super Learner. Statistical Applications of Genetics and Molecular Biology, 6, article 25. http://www.degruyter.com/view/j/sagmb.2007.6.issue-1/sagmb.2007.6.1.1309/sagmb.2007.6.1.1309.xml
van der Laan, M. J., & Rose, S. (2011). Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media.