ml_gKRLS {gKRLS} | R Documentation |
Machine Learning with gKRLS
Description
This provides a number of functions to use gKRLS
(and
mgcv
more generally) as part of machine learning algorithms.
Integration into SuperLearner
and DoubleML
(and mlr3
)
is described below.
Usage
SL.mgcv(Y, X, newX, formula, family, obsWeights, bam = FALSE, ...)
## S3 method for class 'SL.mgcv'
predict(object, newdata, allow_missing_levels = TRUE, ...)
add_bam_to_mlr3()
Arguments
Y |
This is not usually directly specified in |
X |
This is not usually directly specified in |
newX |
This is not usually directly specified in |
formula |
A formula used for |
family |
This is not usually directly specified in |
obsWeights |
This is not usually directly specified in |
bam |
A logical value for whether |
... |
Additional arguments to |
object |
This is not usually directly specified in |
newdata |
This is not usually directly specified in |
allow_missing_levels |
A logical variable that indicates whether missing
levels in factors are allowed for prediction. The default is |
Details
Ensembles: SuperLearner
integration is provided by
SL.mgcv
and the corresponding predict method. mgcv::bam
can be
enabled by using bam = TRUE
. A formula without an outcome
must be explicitly provided.
Please note that it is often useful to load SuperLearner
before
gKRLS
or mgcv
to avoid functions including gam
and
s
being masked from other packages.
Double Machine Learning: DoubleML
integration is provided in
two ways. First, one could load mlr3extralearners
to access
regr.gam
and classif.gam
.
Second, this package provides mgcv::bam
integration directly with a
slight adaption of the mlr3extralearner
implementation (see
?LearnerClassifBam
for more details). These can be either manually
added to the list of mlr3
learners by calling
add_bam_to_mlr3()
or direct usage. Examples are provided below. For
classif.bam
and regr.bam
, the formula argument is mandatory.
Value
All three of the returned functions are usually called for use in
other functions, i.e. creating objects for use in SuperLearner
or
adding bam
models to mlr3
.
References
Wood, Simon N and Goude, Yannig and Simon Shaw. 2015. "Generalized Additive Models for Large Data Sets." Journal of the Royal Statistical Society: Series C (Applied Statistics) 64(1):139-155.
Examples
set.seed(789)
N <- 100
x1 <- rnorm(N)
x2 <- rbinom(N, size = 1, prob = .2)
y <- x1^3 - 0.5 * x2 + rnorm(N, 0, 1)
y <- y * 10
X <- cbind(x1, x2, x1 + x2 * 3)
X <- cbind(X, "x3" = rexp(nrow(X)))
if (requireNamespace("SuperLearner", quietly = TRUE)) {
# Estimate Ensemble with SuperLearner
require(SuperLearner)
sl_m <- function(...) { SL.mgcv(formula = ~ x1 + x2 + x3, ...) }
fit_SL <- SuperLearner::SuperLearner(
Y = y, X = data.frame(X),
SL.library = "sl_m"
)
pred <- predict(fit_SL, newdata = data.frame(X))
}
# Estimate Double/Debiased Machine Learning
if (requireNamespace("DoubleML", quietly = TRUE)) {
require(DoubleML)
# Load the models; for testing *ONLY* have multiplier of 2
double_bam_1 <- LearnerRegrBam$new()
double_bam_1$param_set$values$formula <- ~ s(x1, x3, bs = "gKRLS",
xt = gKRLS(sketch_multiplier = NULL, sketch_size_raw = 2))
double_bam_2 <- LearnerClassifBam$new()
double_bam_2$param_set$values$formula <- ~ s(x1, x3, bs = "gKRLS",
xt = gKRLS(sketch_multiplier = NULL, sketch_size_raw = 2))
# Create data
dml_data <- DoubleMLData$new(
data = data.frame(X, y),
x_cols = c("x1", "x3"), y_col = "y",
d_cols = "x2"
)
# Estimate effects treatment (works for other DoubleML methods)
dml_est <- DoubleMLIRM$new(
data = dml_data,
n_folds = 2,
ml_g = double_bam_1,
ml_m = double_bam_2
)$fit()
}