rgam {relgam} | R Documentation |
Fit reluctant generalized additive model
Description
Fits a reluctant generalized additive model (RGAM) for an entire regularization
path indexed by the parameter lambda
. Fits linear, logistic, Poisson
and Cox regression models. RGAM is a three-step algorithm: Step 1 fits the
lasso and computes residuals, Step 2 constructs the non-linear features, and
Step 3 fits a lasso of the response on both the linear and non-linear features.
Usage
rgam(x, y, lambda = NULL, lambda.min.ratio = ifelse(nrow(x) < ncol(x),
0.01, 1e-04), standardize = TRUE, family = c("gaussian", "binomial",
"poisson", "cox"), offset = NULL, init_nz, removeLin = TRUE,
nfolds = 5, foldid = NULL, df = 4, gamma, tol = 0.01,
parallel = FALSE, verbose = TRUE)
Arguments
x |
Input matrix, of dimension |
y |
Response variable. Quantitative for |
lambda |
A user-supplied |
lambda.min.ratio |
Smallest value for lambda as a fraction of the largest lambda value. The default depends on the sample size nobs relative to the number of variables nvars. If nobs > nvars, the default is 0.0001, close to zero. If nobs < nvars, the default is 0.01. |
standardize |
If |
family |
Response type. Either |
offset |
A vector of length nobs. Useful for the "poisson" family (e.g. log of exposure time), or for refining a model by starting at a current fit. Default is NULL. If supplied, then values must also be supplied to the predict function. |
init_nz |
A vector specifying which features we must include when computing the non-linear features. Default is to construct non-linear features for all given features. |
removeLin |
When constructing the non-linear features, do we remove
the linear component from them? Default is |
nfolds |
Number of folds for CV in Step 1 (default is 5). Although
|
foldid |
An optional vector of values between 1 and |
df |
Degrees of freedom for the non-linear fit in Step 2. Default is 4. |
gamma |
Scale factor for non-linear features (vs. original features), to
be between 0 and 1. Default is 0.8 if |
tol |
Parameter to be passed to |
parallel |
If TRUE, the |
verbose |
If |
Details
If there are variables which the user definitely wants to compute non-linear
versions for in Step 2 of the algorithm, they can be specified as a vector for
the init_nz
option. The algorithm will compute non-linear versions for
these features as well as the features suggested by Step 1 of the algorithm.
If standardize = TRUE
, the standard deviation of the linear and
non-linear features would be 1 and gamma respectively. If
standardize = FALSE
, linear features will remain on their original
scale while non-linear features would have standard deviation gamma times
the mean standard deviation of the linear features.
For family="gaussian"
, rgam
standardizes y
to have unit
variance (using 1/n
rather than 1/(n-1)
formula).
Value
An object of class "rgam"
.
full_glmfit |
The glmnet object resulting from Step 3: fitting a |
spline_fit |
List of spline fits for residual against each response. Needed for predicting on new data. |
lin_comp_fit |
If |
init_nz |
Column indices for the features which we allow to have non-linear relationship with the response. |
step1_nz |
Indices of features which CV in Step 1 chose. |
removeLin |
Did we remove the linear components when constructing the non-linear features? Needed for predicting on new data. |
mxf |
Means of the features (both linear and non-linear). |
sxf |
Scale factors of the features (both linear and non-linear). |
feat |
Column indices of the non-zero features for each value of
|
linfeat |
Column indices of the non-zero linear components for each value of
|
nonlinfeat |
Column indices of the non-zero non-linear components for each value
of |
nzero_feat |
The number of non-zero features for each value of
|
nzero_lin |
The number of non-zero linear components for each value of
|
nzero_nonlin |
The number of non-zero non-linear components for each value
of |
lambda |
The actual sequence of |
p |
The number of features in the original data matrix. |
family |
Response type. |
call |
The call that produced this object. |
Examples
set.seed(1)
n <- 100; p <- 20
x <- matrix(rnorm(n * p), n, p)
beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1)
y <- x %*% beta + rnorm(n)
fit <- rgam(x, y)
# construct non-linear features for only those selected by Step 1
fit <- rgam(x, y, init_nz = c())
# specify scale factor gamma and degrees of freedom
fit <- rgam(x, y, gamma = 1, df = 6)
# binomial family
bin_y <- ifelse(y > 0, 1, 0)
fit2 <- rgam(x, bin_y, family = "binomial")
# Poisson family
poi_y <- rpois(n, exp(x %*% beta))
fit3 <- rgam(x, poi_y, family = "poisson")
# Poisson with offset
offset <- rnorm(n)
fit3 <- rgam(x, poi_y, family = "poisson", offset = offset)