mlfitppml_int {penppml} | R Documentation |
General Penalized PPML Estimation
Description
mlfitppml_int
is the internal wrapper called by mlfitppml
for penalized PPML estimation.
This in turn calls penhdfeppml_int
, penhdfeppml_cluster_int
and hdfeppml_int
as needed. It takes a vector with the dependent variable, a regressor matrix and a set of fixed
effects (in list form: each element in the list should be a separate HDFE). This is a flexible tool
that allows users to select:
Penalty type: either lasso or ridge.
Penalty parameter: users can provide a single global value for lambda (a single regression is estimated), a vector of lambda values (the function estimates the regression using each of them, sequentially) or even coefficient-specific penalty weights.
Method: plugin lasso estimates can be obtained directly from this function too.
Cross-validation: if this option is enabled, the function uses IDs provided by the user to perform k-fold cross-validation and reports the resulting RMSE for all lambda values.
Usage
mlfitppml_int(
y,
x,
fes,
lambdas,
penalty = "lasso",
tol = 1e-08,
hdfetol = 1e-04,
colcheck = TRUE,
colcheck_x = colcheck,
colcheck_x_fes = colcheck,
post = TRUE,
cluster = NULL,
method = "bic",
IDs = 1:n,
verbose = FALSE,
xval = FALSE,
standardize = TRUE,
vcv = TRUE,
phipost = TRUE,
penweights = NULL,
K = 15,
gamma_val = NULL,
mu = NULL
)
Arguments
y |
Dependent variable (a vector) |
x |
Regressor matrix. |
fes |
List of fixed effects. |
lambdas |
Vector of penalty parameters. |
penalty |
A string indicating the penalty type. Currently supported: "lasso" and "ridge". |
tol |
Tolerance parameter for convergence of the IRLS algorithm. |
hdfetol |
Tolerance parameter for the within-transformation step,
passed on to |
colcheck |
Logical. If |
colcheck_x |
Logical. If |
colcheck_x_fes |
Logical. If |
post |
Logical. If |
cluster |
Optional: a vector classifying observations into clusters (to use when calculating SEs). |
method |
The user can set this equal to "plugin" to perform the plugin algorithm with coefficient-specific penalty weights (see details). Otherwise, a single global penalty is used. |
IDs |
A vector of fold IDs for k-fold cross validation. If left unspecified, each observation is assigned to a different fold (warning: this is likely to be very resource-intensive). |
verbose |
Logical. If |
xval |
Logical. If |
standardize |
Logical. If |
vcv |
Logical. If |
phipost |
Logical. If |
penweights |
Optional: a vector of coefficient-specific penalties to use in plugin lasso when
|
K |
Maximum number of iterations for the plugin algorithm to converge. |
gamma_val |
Numerical value that determines the regularization threshold as defined in Belloni, Chernozhukov, Hansen, and Kozbur (2016). NULL default sets parameter to 0.1/log(n). |
mu |
A vector of initial values for mu that can be passed to the command. |
Details
For technical details on the algorithms used, see hdfeppml_int (post-lasso regression), penhdfeppml_int (standard penalized regression), penhdfeppml_cluster_int (plugin lasso), and xvalidate (cross-validation).
Value
A list with the following elements:
-
beta
: ifpost = FALSE
, alength(lambdas)
xncol(x)
matrix with coefficient (beta) estimates from the penalized regressions. Ifpost = TRUE
, this is the matrix of coefficients from the post-penalty regressions. -
beta_pre
: ifpost = TRUE
, alength(lambdas)
xncol(x)
matrix with coefficient (beta) estimates from the penalized regressions. -
bic
: Bayesian Information Criterion. -
lambdas
: vector of penalty parameters. -
ses
: standard errors of the coefficients of the post-penalty regression. Note that these are only provided whenpost = TRUE
. -
rmse
: ifxval = TRUE
, a matrix with the root mean squared error (RMSE - column 2) for each value of lambda (column 1), obtained by cross-validation. -
phi
: coefficient-specific penalty weights (only ifmethod == "plugin"
).
References
Breinlich, H., Corradi, V., Rocha, N., Ruta, M., Santos Silva, J.M.C. and T. Zylkin (2021). "Machine Learning in International Trade Research: Evaluating the Impact of Trade Agreements", Policy Research Working Paper; No. 9629. World Bank, Washington, DC.
Correia, S., P. Guimaraes and T. Zylkin (2020). "Fast Poisson estimation with high dimensional fixed effects", STATA Journal, 20, 90-115.
Gaure, S (2013). "OLS with multiple high dimensional category variables", Computational Statistics & Data Analysis, 66, 8-18.
Friedman, J., T. Hastie, and R. Tibshirani (2010). "Regularization paths for generalized linear models via coordinate descent", Journal of Statistical Software, 33, 1-22.
Belloni, A., V. Chernozhukov, C. Hansen and D. Kozbur (2016). "Inference in high dimensional panel models with an application to gun control", Journal of Business & Economic Statistics, 34, 590-605.
Examples
## Not run:
# First, we need to transform the data (this is what mlfitppml handles internally). Start by
# filtering the data set to keep only countries in the Americas:
americas <- countries$iso[countries$region == "Americas"]
trade <- trade[(trade$imp %in% americas) & (trade$exp %in% americas), ]
# Now generate the needed x, y and fes objects:
y <- trade$export
x <- data.matrix(trade[, -1:-6])
fes <- list(exp_time = interaction(trade$exp, trade$time),
imp_time = interaction(trade$imp, trade$time),
pair = interaction(trade$exp, trade$imp))
# Finally, we try mlfitppml_int with a lasso penalty (the default) and two lambda values:
reg <- mlfitppml_int(y = y, x = x, fes = fes, lambdas = c(0.1, 0.01))
# We can also try plugin lasso:
\donttest{reg <- mlfitppml_int(y = y, x = x, fes = fes, cluster = fes$pair, method = "plugin")}
# For an example with cross-validation, please see the vignette.
## End(Not run)