R: Machine Learning with gKRLS

ml_gKRLS {gKRLS}

R Documentation

Machine Learning with gKRLS

Description

This provides a number of functions to use gKRLS (and mgcv more generally) as part of machine learning algorithms. Integration into SuperLearner and DoubleML (and mlr3) is described below.

Usage

SL.mgcv(Y, X, newX, formula, family, obsWeights, bam = FALSE, ...)

## S3 method for class 'SL.mgcv'
predict(object, newdata, allow_missing_levels = TRUE, ...)

add_bam_to_mlr3()

Arguments

`Y`	This is not usually directly specified in `SL.mgcv`, see the examples below and documentation in `SuperLearner` for more details.
`X`	This is not usually directly specified in `SL.mgcv`, see the examples below and documentation in `SuperLearner` for more details.
`newX`	This is not usually directly specified in `SL.mgcv`, see the examples below and documentation in `SuperLearner` for more details.
`formula`	A formula used for `gam` or `bam` from `mgcv`. This must be specified, see the examples.
`family`	This is not usually directly specified in `SL.mgcv`, see the examples below and documentation in `SuperLearner` for more details.
`obsWeights`	This is not usually directly specified in `SL.mgcv`, see the examples below and documentation in `SuperLearner` for more details.
`bam`	A logical value for whether `mgcv::bam` should be used instead of `mgcv::gam`. Default is `FALSE`. For large datasets, this can dramatically improve estimation time. Wood et al. (2015) and `mgcv` provide details on `bam`.
`...`	Additional arguments to `mgcv::gam` and `mgcv::bam`.
`object`	This is not usually directly specified in `SL.mgcv`, see the examples below and documentation in `SuperLearner` for more details.
`newdata`	This is not usually directly specified in `SL.mgcv`, see the examples below and documentation in `SuperLearner` for more details.
`allow_missing_levels`	A logical variable that indicates whether missing levels in factors are allowed for prediction. The default is `TRUE`.

Details

Ensembles: SuperLearner integration is provided by SL.mgcv and the corresponding predict method. mgcv::bam can be enabled by using bam = TRUE. A formula without an outcome must be explicitly provided.

Please note that it is often useful to load SuperLearner before gKRLS or mgcv to avoid functions including gam and s being masked from other packages.

Double Machine Learning: DoubleML integration is provided in two ways. First, one could load mlr3extralearners to access regr.gam and classif.gam.

Second, this package provides mgcv::bam integration directly with a slight adaption of the mlr3extralearner implementation (see ?LearnerClassifBam for more details). These can be either manually added to the list of mlr3 learners by calling add_bam_to_mlr3() or direct usage. Examples are provided below. For classif.bam and regr.bam, the formula argument is mandatory.

Value

All three of the returned functions are usually called for use in other functions, i.e. creating objects for use in SuperLearner or adding bam models to mlr3.

References

Wood, Simon N and Goude, Yannig and Simon Shaw. 2015. "Generalized Additive Models for Large Data Sets." Journal of the Royal Statistical Society: Series C (Applied Statistics) 64(1):139-155.

Examples

set.seed(789)
N <- 100
x1 <- rnorm(N)
x2 <- rbinom(N, size = 1, prob = .2)
y <- x1^3 - 0.5 * x2 + rnorm(N, 0, 1)
y <- y * 10
X <- cbind(x1, x2, x1 + x2 * 3)
X <- cbind(X, "x3" = rexp(nrow(X)))

if (requireNamespace("SuperLearner", quietly = TRUE)) {
# Estimate Ensemble with SuperLearner
  require(SuperLearner)
  sl_m <- function(...) { SL.mgcv(formula = ~ x1 + x2 + x3, ...) }
  fit_SL <- SuperLearner::SuperLearner(
    Y = y, X = data.frame(X),
    SL.library = "sl_m"
  )
  pred <- predict(fit_SL, newdata = data.frame(X))
}
# Estimate Double/Debiased Machine Learning
if (requireNamespace("DoubleML", quietly = TRUE)) {
  require(DoubleML)
  # Load the models; for testing *ONLY* have multiplier of 2
  double_bam_1 <- LearnerRegrBam$new()
  double_bam_1$param_set$values$formula <- ~ s(x1, x3, bs = "gKRLS", 
    xt = gKRLS(sketch_multiplier = NULL, sketch_size_raw = 2))
  double_bam_2 <- LearnerClassifBam$new()
  double_bam_2$param_set$values$formula <- ~ s(x1, x3, bs = "gKRLS", 
    xt = gKRLS(sketch_multiplier = NULL, sketch_size_raw = 2))

  # Create data
  dml_data <- DoubleMLData$new(
    data = data.frame(X, y),
    x_cols = c("x1", "x3"), y_col = "y",
    d_cols = "x2"
  )
  # Estimate effects treatment (works for other DoubleML methods)
  dml_est <- DoubleMLIRM$new(
    data = dml_data,
    n_folds = 2,
    ml_g = double_bam_1,
    ml_m = double_bam_2
  )$fit()
}

[Package gKRLS version 1.0.2 Index]