gKRLS {gKRLS} | R Documentation |
Generalized Kernel Regularized Least Squares
Description
This page documents how to use gKRLS
as part of a model estimated with
mgcv
. Post-estimation functions to calculate marginal effects are
documented elsewhere, e.g. calculate_effects.
Usage
gKRLS(
sketch_method = "subsampling",
standardize = "Mahalanobis",
bandwidth = NULL,
sketch_multiplier = 5,
sketch_size_raw = NULL,
sketch_prob = NULL,
rescale_penalty = TRUE,
truncate.eigen.tol = sqrt(.Machine$double.eps),
demean_kernel = FALSE,
remove_instability = TRUE
)
Arguments
sketch_method |
A string that specifies which kernel sketching method
should be used (default of To force |
standardize |
A string that specifies how the data is standardized
before calculating the distance between observations. The default is
|
bandwidth |
A bandwidth |
sketch_multiplier |
A number that sets the size of the sketching
dimension: |
sketch_size_raw |
A number to set the exact size of the sketching
dimension. The default, |
sketch_prob |
A probability for an element of the sketching matrix to
equal |
rescale_penalty |
A logical value for whether the penalty should be
rescaled for numerical stability. See documentation for
|
truncate.eigen.tol |
A threshold to remove columns of the penalty
|
demean_kernel |
A logical value that indicates whether columns of the
(sketched) kernel should be demeaned before estimation. The default is
|
remove_instability |
A logical value that indicates whether numerical
zeros (set via |
Details
Overview: The gKRLS
function should not be called directly. Its
options, described above, control how gKRLS
is estimated. It should be
passed to mgcv
as follows: s(x1, x2, x3, bs = "gKRLS", xt =
gKRLS(...))
. Multiple kernels can be specified and have different
gKRLS
arguments. It can also be used alongside the existing options
for s()
in mgcv
.
Default Settings: By default, bs = "gKRLS"
uses Mahalanobis
distance between the observations, random sketching using subsampling
sketching (i.e., where the kernel is constructed using a random sample of the
observations; Yang et al. 2017) and a sketching dimension of 5 *
ceiling(N^(1/3))
where N
is the number of observations. Chang and
Goplerud (2023) provide an exploration of alternative options.
Notes: Please note that variables must be separated with commas inside
of s(...)
and that character variables should usually be passed as
factors to work smoothly with mgcv
. When using this function with
bam
, the sketching dimension uses chunk.size
in place of
N
and thus either chunk.size
or sketch_size_raw
must be used to cause
the sketching dimension to increase with N
.
Value
gKRLS
returns a named list with the elements in "Arguments".
References
Chang, Qing and Max Goplerud. 2023. "Generalized Kernel Regularized Least Squares". https://arxiv.org/abs/2209.14355.
Drineas, Petros and Mahoney, Michael W and Nello Cristianini. 2005. "On the Nyström Method for Approximating a Gram Matrix For Improved Kernel-Based Learning". Journal of Machine Learning Research 6(12):2153-2175.
Yang, Yun and Pilanci, Mert and Martin J. Wainwright. 2017. "Randomized Sketches for Kernels: Fast and Optimal Nonparametric Regression". Annals of Statistics 45(3):991-1023.
Examples
set.seed(123)
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n)
x3 <- rnorm(n)
state <- sample(letters[1:5], n, replace = TRUE)
y <- 0.3 * x1 + 0.4 * x2 + 0.5 * x3 + rnorm(n)
data <- data.frame(y, x1, x2, x3, state)
data$state <- factor(data$state)
# A gKRLS model without fixed effects
fit_gKRLS <- mgcv::gam(y ~ s(x1, x2, x3, bs = "gKRLS"), data = data)
summary(fit_gKRLS)
# A gKRLS model with fixed effects outside of the kernel
fit_gKRLS_FE <- mgcv::gam(y ~ state + s(x1, x2, x3, bs = "gKRLS"), data = data)
# HC3 is not available for mgcv; this uses the effective degrees of freedom
# instead of the number of columns; see ?estfun.gam for details
robust <- sandwich::vcovHC(fit_gKRLS, type = 'HC1')
cluster <- sandwich::vcovCL(fit_gKRLS, cluster = data$state)
# Change default standardization to "scaled", sketch method to Gaussian,
# and alter sketching multiplier
fit_gKRLS_alt <- mgcv::gam(y ~ s(x1, x2, x3,
bs = "gKRLS",
xt = gKRLS(
standardize = "scaled",
sketch_method = "gaussian",
sketch_multiplier = 2
)
),
data = data
)
# A model with multiple kernels
fit_gKRLS_2 <- mgcv::gam(y ~ s(x1, x2, bs = 'gKRLS') + s(x1, x3, bs = 'gKRLS'), data = data)
# A model with a custom set of ids for sketching
id <- sample(1:n, 5)
fit_gKRLS_custom <- mgcv::gam(y ~ s(x1, bs = 'gKRLS', xt = gKRLS(sketch_method = id)), data = data)
# Note that the ids of the sampled observations can be extracted
# from the fitted mgcv object
stopifnot(identical(id, fit_gKRLS_custom$smooth[[1]]$subsampling_id))
# calculate marginal effect (see ?calculate_effects for more examples)
calculate_effects(fit_gKRLS, variables = "x1")