forked from osofr/sl3
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME-backup.Rmd
161 lines (121 loc) · 5.5 KB
/
README-backup.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
output: github_document
bibliography: README-refs.bib
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# R/`sl3`: modern Super Learning with pipelines
[](https://travis-ci.org/jeremyrcoyle/sl3)
[](https://ci.appveyor.com/project/jeremyrcoyle/sl3)
[](https://codecov.io/github/jeremyrcoyle/sl3?branch=master)
[](http://www.repostatus.org/#wip)
[](http://www.gnu.org/licenses/gpl-3.0)
[](https://gitter.im/sl3-Rpkg/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
> A modern implementation of the Super Learner algorithm for ensemble learning
> and model stacking
__Authors:__ [Jeremy Coyle](https://github.com/jeremyrcoyle), [Nima
Hejazi](https://github.com/nhejazi), [Ivana
Malenica](https://github.com/podTockom), [Oleg
Sofrygin](https://github.com/osofr)
---
## What's `sl3`?
`sl3` is a modern implementation of the Super Learner algorithm of
@vdl2007super. The Super Learner algorithm performs ensemble learning in one of
two fashions:
1. The "discrete" Super Learner can be used to select the best prediction
algorithm among a supplied library of learning algorithms ("learners" in the
`sl3` nomenclature) -- that is, that algorithm which minimizes the
cross-validated risk with respect to some appropriate loss function.
2. The "ensemble" Super Learner can be used to assign weights to specified
learning algorithms (in a user-supplied library) in order to create a
combination of these learners that minimizes the cross-validated risk with
respect to an appropriate loss function. This notion of weighted combinations
has also been called _stacked regression_ [@breiman1996stacked].
---
## Installation
<!--
For standard use, we recommend installing the package from
[CRAN](https://cran.r-project.org/) via
```{r cran-installation, eval = FALSE}
install.packages("sl3")
```
-->
Install the most recent _stable release_ from GitHub via
[`devtools`](https://www.rstudio.com/products/rpackages/devtools/):
```{r gh-master-installation, eval = FALSE}
devtools::install_github("jeremyrcoyle/sl3")
```
---
## Issues
If you encounter any bugs or have any specific feature requests, please [file an
issue](https://github.com/jeremyrcoyle/sl3/issues).
---
## Examples
`sl3` makes the process of applying screening algorithms, learning algorithms,
combining both types of algorithms into a stacked regression model, and
cross-validating this whole process essentially trivial. The best way to
understand this is to see the `sl3` package in action:
```{r sl3-simple-example}
set.seed(49753)
suppressMessages(library(data.table))
library(dplyr)
library(SuperLearner)
library(origami)
library(sl3)
# load example data set
data(cpp)
cpp <- cpp %>%
dplyr::filter(!is.na(haz)) %>%
mutate_all(funs(replace(., is.na(.), 0)))
# use covariates of intest and the outcome to build a task object
covars <- c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs",
"sexn")
task <- sl3_Task$new(cpp, covariates = covars, outcome = "haz")
# set up screeners and learners via built-in functions and pipelines
slscreener <- Lrnr_pkg_SuperLearner_screener$new("screen.glmnet")
glm_learner <- Lrnr_glm$new()
screen_and_glm <- Pipeline$new(slscreener, glm_learner)
SL.glmnet_learner <- Lrnr_pkg_SuperLearner$new(SL_wrapper = "SL.glmnet")
# stack learners into a model (including screeners and pipelines)
learner_stack <- Stack$new(SL.glmnet_learner, glm_learner, screen_and_glm)
stack_fit <- learner_stack$train(task)
preds <- stack_fit$predict()
head(preds)
```
---
## Contributions
It is our hope that `sl3` will grow to be widely used for creating stacked
regression models and the cross-validation of pipelines that make up such
models, as well as the variety of other applications in which the Super Learner
algorithm plays a role. To that end, contributions are very welcome, though we
ask that interested contributors consult our [`contribution
guidelines`](https://github.com/jeremyrcoyle/sl3/blob/master/CONTRIBUTING.md)
prior to submitting a pull request.
---
After using the `sl3` R package, please cite the following:
@misc{coyle2017sl3,
author = {Coyle, Jeremy R and Hejazi, Nima S and Malenica, Ivana and
Sofrygin, Oleg},
title = {{sl3}: Modern Pipelines for Machine Learning and {Super
Learning}},
year = {2017},
howpublished = {\url{https://github.com/jeremyrcoyle/sl3}},
url = {http://dx.doi.org/DOI_HERE},
doi = {DOI_HERE}
}
---
## License
© 2017 [Jeremy R. Coyle](https://github.com/jeremyrcoyle), [Nima S.
Hejazi](https://github.com/nhejazi), [Ivana
Malenica](https://github.com/podTockom), [Oleg
Sofrygin](https://github.com/osofr)
The contents of this repository are distributed under the GPL-3 license. See
file `LICENSE` for details.
---
## References