desla {desla}R Documentation

Desparsified lasso

Description

Calculates the desparsified lasso as originally introduced in van de Geer et al. (2014), and provides inference suitable for high-dimensional time series, based on the long run covariance estimator in Adamek et al. (2021).

Usage

desla(
  X,
  y,
  H,
  alphas = 0.05,
  penalize_H = TRUE,
  R = NULL,
  q = NULL,
  demean = TRUE,
  scale = TRUE,
  progress_bar = TRUE,
  parallel = TRUE,
  threads = NULL,
  PI_constant = NULL,
  LRV_bandwidth = NULL
)

Arguments

X

T_ x N regressor matrix

y

T_ x 1 dependent variable vector

H

indexes of relevant regressors

alphas

(optional) vector of significance levels (0.05 by default)

penalize_H

(optional) boolean, true if you want the variables in H to be penalized (TRUE by default)

R

(optional) matrix with number of columns the dimension of H, used to test the null hypothesis R*beta=q (identity matrix as default)

q

(optional) vector of size same as the rows of H, used to test the null hypothesis R*beta=q (zeroes by default)

demean

(optional) boolean, true if X and y should be demeaned before the desparsified lasso is calculated. This is recommended, due to the assumptions for the method (true by default)

scale

(optional) boolean, true if X and y should be scaled by the column-wise standard deviations. Recommended for lasso based methods in general, since the penalty is scale-sensitive (true by default)

progress_bar

(optional) boolean, displays a progress bar while running if true, tracking the progress of estimating the nodewise regressions (TRUE by default)

parallel

boolean, whether parallel computing should be used (TRUE by default)

threads

(optional) integer, how many threads should be used for parallel computing if parallel=TRUE (default is to use all but two)

PI_constant

(optional) constant, used in the plug-in selection method (0.8 by default). For details see Adamek et al. (2021)

LRV_bandwidth

(optional) vector of parameters controlling the bandwidth Q_T used in the long run covariance matrix, Q_T=ceil(LRV_bandwidth[1]*T_^LRV_bandwidth[2]). When LRV_bandwidth=NULL, the bandwidth is selected according to Andrews (1991) (default)

Value

Returns a list with the following elements:

bhat

desparsified lasso estimates for the parameters indexed by H, unscaled to be in the original scale of y and X

standard_errors

standard errors of the estimates for variables indexed by H

intervals

matrix containing the confidence intervals for parameters indexed in H, unscaled to be in the original scale of y and X

betahat

lasso estimates from the initial regression of y on X

DSL_matrices

list containing the matrices Gammahat, Upsilonhat_inv and Thetahat used for calculating the desparsified lasso, as well as Omegahat, the long run covariance matrix for the variables indexed by H. For details see Adamek et al. (2021)

residuals

list containing the vector of residuals from the initial lasso regression (init) and the matrix of residuals from the nodewise regressions (nw)

lambdas

values of lambda selected in the initial lasso regression (init) and the nodewise lasso regressions (nw)

selected_vars

vector of indexes of the nonzero parameters in the initial lasso (init) and each nodewise regression (nw)

wald_test

list containing elements for inference on R beta=q. joint_test contains the test statistic for the overall null hypothesis R beta=q along with the p-value. At default values of R and q, this tests the joint significance of all variables indexed by H. row_tests contains the vector of z-statistics and confidence intervals associated with each row of R beta - q, unscaled to be in the original scale of y and X. This output is only given when either R or q are supplied

References

Adamek R, Smeekes S, Wilms I (2021). “LASSO inference for high-dimensional time series.” arXiv preprint arXiv:2007.10952.

Andrews DW (1991). “Heteroskedasticity and autocorrelation consistent covariance matrix estimation.” Econometrica, 59(3), 817–858.

van de Geer S, Buhlmann P, Ritov Y, Dezeure R (2014). “On asymptotically optimal confidence regions and tests for high-dimensional models.” Annals of Statistics, 42(3), 1166–1202.

Examples

X<-matrix(rnorm(50*50), nrow=50)
y<-X[,1:4] %*% c(1, 2, 3, 4) + rnorm(50)
H<-c(1, 2, 3, 4)
d<-desla(X, y, H)

[Package desla version 0.3.0 Index]