randomizedLasso {selectiveInference} | R Documentation |
Inference for the randomized lasso, with a fixed lambda
Description
Solve a randomly perturbed LASSO problem.
Usage
randomizedLasso(X,
y,
lam,
family=c("gaussian", "binomial"),
noise_scale=NULL,
ridge_term=NULL,
max_iter=100,
kkt_tol=1.e-4,
parameter_tol=1.e-8,
objective_tol=1.e-8,
objective_stop=FALSE,
kkt_stop=TRUE,
parameter_stop=TRUE)
Arguments
X |
Matrix of predictors (n by p); |
y |
Vector of outcomes (length n) |
lam |
Value of lambda used to compute beta. See the above warning Be careful! This function uses the "standard" lasso objective
In contrast, glmnet multiplies the first term by a factor of 1/n.
So after running glmnet, to extract the beta corresponding to a value lambda,
you need to use |
family |
Response type: "gaussian" (default), "binomial". |
noise_scale |
Scale of Gaussian noise added to objective. Default is 0.5 * sd(y) times the sqrt of the mean of the trace of X^TX. |
ridge_term |
A small "elastic net" or ridge penalty is added to ensure the randomized problem has a solution. 0.5 * sd(y) times the sqrt of the mean of the trace of X^TX divided by sqrt(n). |
max_iter |
How many rounds of updates used of coordinate descent in solving randomized LASSO. |
kkt_tol |
Tolerance for checking convergence based on KKT conditions. |
parameter_tol |
Tolerance for checking convergence based on convergence of parameters. |
objective_tol |
Tolerance for checking convergence based on convergence of objective value. |
kkt_stop |
Should we use KKT check to determine when to stop? |
parameter_stop |
Should we use convergence of parameters to determine when to stop? |
objective_stop |
Should we use convergence of objective value to determine when to stop? |
Details
For family="gaussian"
this function uses the "standard" lasso objective
1/2 \|y - x \beta\|_2^2 + \lambda \|\beta\|_1
and adds a term
- \omega^T\beta + \frac{\epsilon}{2} \|\beta\|^2_2
where omega is drawn from IID normals with standard deviation
noise_scale
and epsilon given by ridge_term
.
See below for default values of noise_scale
and ridge_term
.
For family="binomial"
, the squared error loss is replaced by the
negative of the logistic log-likelihood.
Value
X |
Design matrix. |
y |
Response vector. |
lam |
Vector of penalty parameters. |
family |
Family: "gaussian" or "binomial". |
active_set |
Set of non-zero coefficients in randomized solution that were penalized. Integers from 1:p. |
inactive_set |
Set of zero coefficients in randomized solution. Integers from 1:p. |
unpenalized_set |
Set of non-zero coefficients in randomized solution that were not penalized. Integers from 1:p. |
sign_soln |
The sign pattern of the randomized solution. |
full_law |
List describing sampling parameters for conditional law of all optimization variables given the data in the LASSO problem. |
conditional_law |
List describing sampling parameters for conditional law of only the scaling variables given the data and the observed subgradient in the LASSO problem. |
internal_transform |
Affine transformation describing relationship between internal representation of the data and the data compontent of score of the likelihood at the unregularized MLE based on the sign_vector (a.k.a. relaxed LASSO). |
observed_raw |
Data component of the score at the unregularized MLE. |
noise_scale |
SD of Gaussian noise used to draw the perturbed objective. |
soln |
The randomized solution. Inference is made conditional on its sign vector (so no more snooping of this value is formally permitted.)
If |
perturb |
The random vector in the linear term added to the objective. |
Author(s)
Jelena Markovic, Jonathan Taylor
References
Xiaoying Tian, and Jonathan Taylor (2015). Selective inference with a randomized response. arxiv.org:1507.06739
Xiaoying Tian, Snigdha Panigrahi, Jelena Markovic, Nan Bi and Jonathan Taylor (2016). Selective inference after solving a convex problem. arxiv:1609.05609
Examples
set.seed(43)
n = 50
p = 10
sigma = 0.2
lam = 0.5
X = matrix(rnorm(n*p), n, p)
X = scale(X, TRUE, TRUE) / sqrt(n-1)
beta = c(3,2,rep(0,p-2))
y = X%*%beta + sigma*rnorm(n)
result = randomizedLasso(X, y, lam)