| gKRLS {gKRLS} | R Documentation |
Generalized Kernel Regularized Least Squares
Description
This page documents how to use gKRLS as part of a model estimated with
mgcv. Post-estimation functions to calculate marginal effects are
documented elsewhere, e.g. calculate_effects.
Usage
gKRLS(
sketch_method = "subsampling",
standardize = "Mahalanobis",
bandwidth = NULL,
sketch_multiplier = 5,
sketch_size_raw = NULL,
sketch_prob = NULL,
rescale_penalty = TRUE,
truncate.eigen.tol = sqrt(.Machine$double.eps),
demean_kernel = FALSE,
remove_instability = TRUE
)
Arguments
sketch_method |
A string that specifies which kernel sketching method
should be used (default of To force |
standardize |
A string that specifies how the data is standardized
before calculating the distance between observations. The default is
|
bandwidth |
A bandwidth |
sketch_multiplier |
A number that sets the size of the sketching
dimension: |
sketch_size_raw |
A number to set the exact size of the sketching
dimension. The default, |
sketch_prob |
A probability for an element of the sketching matrix to
equal |
rescale_penalty |
A logical value for whether the penalty should be
rescaled for numerical stability. See documentation for
|
truncate.eigen.tol |
A threshold to remove columns of the penalty
|
demean_kernel |
A logical value that indicates whether columns of the
(sketched) kernel should be demeaned before estimation. The default is
|
remove_instability |
A logical value that indicates whether numerical
zeros (set via |
Details
Overview: The gKRLS function should not be called directly. Its
options, described above, control how gKRLS is estimated. It should be
passed to mgcv as follows: s(x1, x2, x3, bs = "gKRLS", xt =
gKRLS(...)). Multiple kernels can be specified and have different
gKRLS arguments. It can also be used alongside the existing options
for s() in mgcv.
Default Settings: By default, bs = "gKRLS" uses Mahalanobis
distance between the observations, random sketching using subsampling
sketching (i.e., where the kernel is constructed using a random sample of the
observations; Yang et al. 2017) and a sketching dimension of 5 *
ceiling(N^(1/3)) where N is the number of observations. Chang and
Goplerud (2023) provide an exploration of alternative options.
Notes: Please note that variables must be separated with commas inside
of s(...) and that character variables should usually be passed as
factors to work smoothly with mgcv. When using this function with
bam, the sketching dimension uses chunk.size in place of
N and thus either chunk.size or sketch_size_raw must be used to cause
the sketching dimension to increase with N.
Value
gKRLS returns a named list with the elements in "Arguments".
References
Chang, Qing and Max Goplerud. 2023. "Generalized Kernel Regularized Least Squares". https://arxiv.org/abs/2209.14355.
Drineas, Petros and Mahoney, Michael W and Nello Cristianini. 2005. "On the Nyström Method for Approximating a Gram Matrix For Improved Kernel-Based Learning". Journal of Machine Learning Research 6(12):2153-2175.
Yang, Yun and Pilanci, Mert and Martin J. Wainwright. 2017. "Randomized Sketches for Kernels: Fast and Optimal Nonparametric Regression". Annals of Statistics 45(3):991-1023.
Examples
set.seed(123)
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n)
x3 <- rnorm(n)
state <- sample(letters[1:5], n, replace = TRUE)
y <- 0.3 * x1 + 0.4 * x2 + 0.5 * x3 + rnorm(n)
data <- data.frame(y, x1, x2, x3, state)
data$state <- factor(data$state)
# A gKRLS model without fixed effects
fit_gKRLS <- mgcv::gam(y ~ s(x1, x2, x3, bs = "gKRLS"), data = data)
summary(fit_gKRLS)
# A gKRLS model with fixed effects outside of the kernel
fit_gKRLS_FE <- mgcv::gam(y ~ state + s(x1, x2, x3, bs = "gKRLS"), data = data)
# HC3 is not available for mgcv; this uses the effective degrees of freedom
# instead of the number of columns; see ?estfun.gam for details
robust <- sandwich::vcovHC(fit_gKRLS, type = 'HC1')
cluster <- sandwich::vcovCL(fit_gKRLS, cluster = data$state)
# Change default standardization to "scaled", sketch method to Gaussian,
# and alter sketching multiplier
fit_gKRLS_alt <- mgcv::gam(y ~ s(x1, x2, x3,
bs = "gKRLS",
xt = gKRLS(
standardize = "scaled",
sketch_method = "gaussian",
sketch_multiplier = 2
)
),
data = data
)
# A model with multiple kernels
fit_gKRLS_2 <- mgcv::gam(y ~ s(x1, x2, bs = 'gKRLS') + s(x1, x3, bs = 'gKRLS'), data = data)
# A model with a custom set of ids for sketching
id <- sample(1:n, 5)
fit_gKRLS_custom <- mgcv::gam(y ~ s(x1, bs = 'gKRLS', xt = gKRLS(sketch_method = id)), data = data)
# Note that the ids of the sampled observations can be extracted
# from the fitted mgcv object
stopifnot(identical(id, fit_gKRLS_custom$smooth[[1]]$subsampling_id))
# calculate marginal effect (see ?calculate_effects for more examples)
calculate_effects(fit_gKRLS, variables = "x1")