bootstrap {penppml}R Documentation

Bootstrap Lasso Implementation (in development)

Description

This function performs standard plugin lasso PPML estimation for bootreps samples drawn again with replacement and reports those regressors selected in at least a certain fraction of the bootstrap repetitions.

Usage

bootstrap(
  data,
  dep,
  indep = NULL,
  cluster_id = NULL,
  fixed = NULL,
  selectobs = NULL,
  bootreps = 250,
  boot_threshold = 0.01,
  colcheck_x = FALSE,
  colcheck_x_fes = FALSE,
  post = FALSE,
  gamma_val = NULL,
  verbose = FALSE,
  tol = 1e-06,
  hdfetol = 0.01,
  penweights = NULL,
  maxiter = 1000,
  phipost = TRUE
)

Arguments

data

A data frame containing all relevant variables.

dep

A string with the names of the independent variables or their column numbers.

indep

A vector with the names or column numbers of the regressors. If left unspecified, all remaining variables (excluding fixed effects) are included in the regressor matrix.

cluster_id

A string denoting the cluster-id with which to perform cluster bootstrap.

fixed

A vector with the names or column numbers of factor variables identifying the fixed effects, or a list with the desired interactions between variables in data.

selectobs

Optional. A vector indicating which observations to use (either a logical vector or a numeric vector with row numbers, as usual when subsetting in R).

bootreps

Number of bootstrap repetitions.

boot_threshold

Minimal threshold. If a variable is selected in at least this fraction of times, it is reported at the end of the iterations.

colcheck_x

Logical. If TRUE, this checks collinearity between the independent variables and drops the collinear variables.

colcheck_x_fes

Logical. If TRUE, this checks whether the independent variables are perfectly explained by the fixed effects drops those that are perfectly explained.

post

Logical. If TRUE, estimates a post-penalty regression with the selected variables.

gamma_val

Numerical value that determines the regularization threshold as defined in Belloni, Chernozhukov, Hansen, and Kozbur (2016). NULL default sets parameter to 0.1/log(n).

verbose

Logical. If TRUE, it prints information to the screen while evaluating.

tol

Tolerance parameter for convergence of the IRLS algorithm.

hdfetol

Tolerance parameter for the within-transformation step, passed on to collapse::fhdwithin.

penweights

Optional: a vector of coefficient-specific penalties to use in plugin lasso when method == "plugin".

maxiter

Maximum number of iterations (a number).

phipost

Logical. If TRUE, the plugin coefficient-specific penalty weights are iteratively calculated using estimates from a post-penalty regression. Otherwise, these are calculated using estimates from a penalty regression.

Details

This function enables users to implement the "bootstrap" step in the procedure described in Breinlich, Corradi, Rocha, Ruta, Santos Silva and Zylkin (2020). To do this, Plugin Lasso is run B times. The function can also perform a post-selection estimation.

Value

A matrix with coefficient estimates for all dependent variables.

References

Breinlich, H., Corradi, V., Rocha, N., Ruta, M., Santos Silva, J.M.C. and T. Zylkin (2021). "Machine Learning in International Trade Research: Evaluating the Impact of Trade Agreements", Policy Research Working Paper; No. 9629. World Bank, Washington, DC.

Correia, S., P. Guimaraes and T. Zylkin (2020). "Fast Poisson estimation with high dimensional fixed effects", STATA Journal, 20, 90-115.

Gaure, S (2013). "OLS with multiple high dimensional category variables", Computational Statistics & Data Analysis, 66, 8-18.

Friedman, J., T. Hastie, and R. Tibshirani (2010). "Regularization paths for generalized linear models via coordinate descent", Journal of Statistical Software, 33, 1-22.

Belloni, A., V. Chernozhukov, C. Hansen and D. Kozbur (2016). "Inference in high dimensional panel models with an application to gun control", Journal of Business & Economic Statistics, 34, 590-605.

Examples

## Not run: bs1 <- bootstrap(data=trade3, dep="export",
                 cluster_id="clus",
                 fixed=list(c("exp", "time"),
                 c("imp", "time"), c("exp", "imp")),
                 indep=7:22, bootreps=10, colcheck_x = TRUE,
                 colcheck_x_fes = TRUE,
                 boot_threshold = 0.01,
                 post=TRUE, gamma_val=0.01, verbose=FALSE)
## End(Not run)


[Package penppml version 0.2.3 Index]