pcLasso {pcLasso} | R Documentation |
Fit a model with principal components lasso
Description
Fit a model using the principal components lasso for an entire regularization
path indexed by the parameter lambda
. Fits linear and logistic regression
models.
Usage
pcLasso(x, y, w = rep(1, length(y)), family = c("gaussian",
"binomial"), ratio = NULL, theta = NULL, groups = vector("list",
1), lambda.min.ratio = ifelse(nrow(x) < ncol(x), 0.01, 1e-04),
nlam = 100, lambda = NULL, standardize = F, SVD_info = NULL,
nv = NULL, propack = T, thr = 1e-04, maxit = 1e+05,
verbose = FALSE)
Arguments
x |
Input matrix, of dimension |
y |
Response variable. Quantitative for |
w |
Observation weights. Default is 1 for each observation. |
family |
Response type. Either |
ratio |
Ratio of shrinkage between the second and first principal components
in the absence of the |
theta |
Multiplier for the quadratic penalty: a non-negative real number.
|
groups |
A list describing which features belong in each group. The
length of the list should be equal to the number of groups, with
|
lambda.min.ratio |
Smallest value for |
nlam |
Number of |
lambda |
A user supplied |
standardize |
If |
SVD_info |
A list containing SVD information. Usually this should not
be specified by the user: the function will compute it on its own by default.
Since the initial SVD of |
nv |
Number of singular vectors to use in the singular value decompositions. If not specified, the full SVD is used. |
propack |
If |
thr |
Convergence threhold for the coordinate descent algorithm. Default
is |
maxit |
Maximum number of passes over the data for all lambda values;
default is |
verbose |
Print out progess along the way? Default is |
Details
The objective function for "gaussian"
is
1/2 RSS/nobs + \lambda*||\beta||_1 + \theta/2 \sum quadratic
penalty for group k,
where the sum is over the feature groups 1, ..., K
. The objective function
for "binomial"
is
-loglik/nobs + \lambda*||\beta||_1 + \theta/2 \sum quadratic
penalty for group k.
pcLasso
can handle overlapping groups. In this case, the original
x
matrix is expanded to a nobs x p_1+...+p_K
matrix (where
p_k
is the number of features in group k) such that columns
p_1+...+p_{k-1}+1
to p_1+...+p_k
represent the feature matrix for
group k. pcLasso
returns the model coefficients for both the expanded
feature space and the original feature space.
One needs to specify the strength of the quadratic penalty either by
specifying ratio
, which is the ratio of shrinkage between the second
and first principal components in the absence of the \ell_1
penalty,
or by specifying the multiplier theta
. ratio
is unitless and is
more convenient.
pcLasso
always mean centers the columns of the x
matrix. If
standardize=TRUE
, pcLasso
will also scale the columns to have
standard deviation 1. In all cases, the beta
coefficients returned are
for the original x
values (i.e. uncentered and unscaled).
Value
An object of class "pcLasso"
.
beta |
If the groups overlap, a |
origbeta |
If the groups overlap, a |
a0 |
Intercept sequence of length |
lambda |
The actual sequence of |
nzero |
If the groups overlap, the number of non-zero coefficients in the
expanded feature space for each value of |
orignzero |
If the groups are overlapping, this is the number of
non-zero coefficients in the original feature space of the model for each
|
jerr |
Error flag for warnings and errors (largely for internal debugging). |
theta |
Value of |
origgroups |
If the |
groups |
If the groups are not overlapping, this has the same
value as |
SVD_info |
A list containing SVD information. See param |
mx |
If groups overlap, column means of the expanded |
origmx |
Column means of the original |
my |
If |
overlap |
A logical flag indicating if the feature groups were overlapping or not. |
nlp |
Actual number of passes over the data for all lambda values. |
family |
Response type. |
call |
The call that produced this object. |
Examples
set.seed(1)
x <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
# all features in one group by default
fit1 <- pcLasso(x, y, ratio = 0.8)
# print(fit1) # Not run
# features in groups
groups <- vector("list", 4)
for (k in 1:4) {
groups[[k]] <- 5 * (k-1) + 1:5
}
fit2 <- pcLasso(x, y, groups = groups, ratio = 0.8)
# groups can be overlapping
groups[[1]] <- 1:8
fit3 <- pcLasso(x, y, groups = groups, ratio = 0.8)
# specify ratio or theta, but not both
fit4 <- pcLasso(x, y, groups = groups, theta = 10)
# family = "binomial"
y2 <- sample(0:1, 100, replace = TRUE)
fit5 <- pcLasso(x, y2, ratio = 0.8, family = "binomial")
# example where SVD is computed once, then re-used
fit1 <- pcLasso(x, y, ratio = 0.8)
fit2 <- pcLasso(x, y, ratio = 0.8, SVD_info = fit1$SVD_info)