R: Desparsified lasso

desla {desla}

R Documentation

Desparsified lasso

Description

Calculates the desparsified lasso as originally introduced in van de Geer et al. (2014), and provides inference suitable for high-dimensional time series, based on the long run covariance estimator in Adamek et al. (2021).

Usage

desla(
  X,
  y,
  H,
  alphas = 0.05,
  penalize_H = TRUE,
  R = NULL,
  q = NULL,
  demean = TRUE,
  scale = TRUE,
  progress_bar = TRUE,
  parallel = TRUE,
  threads = NULL,
  PI_constant = NULL,
  LRV_bandwidth = NULL
)

Arguments

`X`	`T_` x `N` regressor matrix
`y`	`T_` x 1 dependent variable vector
`H`	indexes of relevant regressors
`alphas`	(optional) vector of significance levels (0.05 by default)
`penalize_H`	(optional) boolean, true if you want the variables in H to be penalized (`TRUE` by default)
`R`	(optional) matrix with number of columns the dimension of `H`, used to test the null hypothesis `R`*beta=`q` (identity matrix as default)
`q`	(optional) vector of size same as the rows of `H`, used to test the null hypothesis `R`*beta=`q` (zeroes by default)
`demean`	(optional) boolean, true if `X` and `y` should be demeaned before the desparsified lasso is calculated. This is recommended, due to the assumptions for the method (true by default)
`scale`	(optional) boolean, true if `X` and `y` should be scaled by the column-wise standard deviations. Recommended for lasso based methods in general, since the penalty is scale-sensitive (true by default)
`progress_bar`	(optional) boolean, displays a progress bar while running if true, tracking the progress of estimating the nodewise regressions (TRUE by default)
`parallel`	boolean, whether parallel computing should be used (TRUE by default)
`threads`	(optional) integer, how many threads should be used for parallel computing if `parallel=TRUE` (default is to use all but two)
`PI_constant`	(optional) constant, used in the plug-in selection method (0.8 by default). For details see Adamek et al. (2021)
`LRV_bandwidth`	(optional) vector of parameters controlling the bandwidth `Q_T` used in the long run covariance matrix, `Q_T`=ceil(`LRV_bandwidth[1]`*`T_`^`LRV_bandwidth[2]`). When `LRV_bandwidth=NULL`, the bandwidth is selected according to Andrews (1991) (default)

Value

Returns a list with the following elements:

`bhat`	desparsified lasso estimates for the parameters indexed by `H`, unscaled to be in the original scale of `y` and `X`
`standard_errors`	standard errors of the estimates for variables indexed by `H`
`intervals`	matrix containing the confidence intervals for parameters indexed in `H`, unscaled to be in the original scale of `y` and `X`
`betahat`	lasso estimates from the initial regression of `y` on `X`
`DSL_matrices`	list containing the matrices `Gammahat`, `Upsilonhat_inv` and `Thetahat` used for calculating the desparsified lasso, as well as `Omegahat`, the long run covariance matrix for the variables indexed by `H`. For details see Adamek et al. (2021)
`residuals`	list containing the vector of residuals from the initial lasso regression (`init`) and the matrix of residuals from the nodewise regressions (`nw`)
`lambdas`	values of lambda selected in the initial lasso regression (`init`) and the nodewise lasso regressions (`nw`)
`selected_vars`	vector of indexes of the nonzero parameters in the initial lasso (`init`) and each nodewise regression (`nw`)
`wald_test`	list containing elements for inference on `R` beta=`q`. `joint_test` contains the test statistic for the overall null hypothesis `R` beta=`q` along with the p-value. At default values of `R` and `q`, this tests the joint significance of all variables indexed by `H`. `row_tests` contains the vector of z-statistics and confidence intervals associated with each row of `R` beta - `q`, unscaled to be in the original scale of `y` and `X`. This output is only given when either `R` or `q` are supplied

References

Adamek R, Smeekes S, Wilms I (2021). “LASSO inference for high-dimensional time series.” arXiv preprint arXiv:2007.10952.

Andrews DW (1991). “Heteroskedasticity and autocorrelation consistent covariance matrix estimation.” Econometrica, 59(3), 817–858.

van de Geer S, Buhlmann P, Ritov Y, Dezeure R (2014). “On asymptotically optimal confidence regions and tests for high-dimensional models.” Annals of Statistics, 42(3), 1166–1202.

Examples

X<-matrix(rnorm(50*50), nrow=50)
y<-X[,1:4] %*% c(1, 2, 3, 4) + rnorm(50)
H<-c(1, 2, 3, 4)
d<-desla(X, y, H)

[Package desla version 0.3.0 Index]