mlfitppml {penppml} | R Documentation |
General Penalized PPML Estimation
Description
mlfitppml
is a general-purpose wrapper function for penalized PPML estimation. This is a
flexible tool that allows users to select:
Penalty type: either lasso or ridge.
Penalty parameter: users can provide a single global value for lambda (a single regression is estimated), a vector of lambda values (the function estimates the regression using each of them, sequentially) or even coefficient-specific penalty weights.
Method: plugin lasso estimates can be obtained directly from this function too.
Cross-validation: if this option is enabled, the function uses IDs provided by the user to perform k-fold cross-validation and reports the resulting RMSE for all lambda values.
Usage
mlfitppml(
data,
dep = 1,
indep = NULL,
fixed = NULL,
cluster = NULL,
selectobs = NULL,
...
)
Arguments
data |
A data frame containing all relevant variables. |
dep |
A string with the name of the independent variable or a column number. |
indep |
A vector with the names or column numbers of the regressors. If left unspecified, all remaining variables (excluding fixed effects) are included in the regressor matrix. |
fixed |
A vector with the names or column numbers of factor variables identifying the fixed effects,
or a list with the desired interactions between variables in |
cluster |
Optional. A string with the name of the clustering variable or a column number. It's also possible to input a vector with several variables, in which case the interaction of all of them is taken as the clustering variable. |
selectobs |
Optional. A vector indicating which observations to use (either a logical vector or a numeric vector with row numbers, as usual when subsetting in R). |
... |
Further arguments, including:
For a full list of options, see mlfitppml_int. |
Details
This function is a thin wrapper around mlfitppml_int
, providing a more convenient interface for
data frames. Whereas the internal function requires some preliminary handling of data sets (y
must be a vector, x
must be a matrix and fes
must be provided in a list), the wrapper
takes a full data frame in the data
argument, and users can simply specify which variables
correspond to y, x and the fixed effects, using either variable names or column numbers.
For technical details on the algorithms used, see hdfeppml (post-lasso regression), penhdfeppml (standard penalized regression), penhdfeppml_cluster (plugin lasso), and xvalidate (cross-validation).
Value
A list with the following elements:
-
beta
: ifpost = FALSE
, alength(lambdas)
xncol(x)
matrix with coefficient (beta) estimates from the penalized regressions. Ifpost = TRUE
, this is the matrix of coefficients from the post-penalty regressions. -
beta_pre
: ifpost = TRUE
, alength(lambdas)
xncol(x)
matrix with coefficient (beta) estimates from the penalized regressions. -
bic
: Bayesian Information Criterion. -
lambdas
: vector of penalty parameters. -
ses
: standard errors of the coefficients of the post-penalty regression. Note that these are only provided whenpost = TRUE
. -
rmse
: ifxval = TRUE
, a matrix with the root mean squared error (RMSE - column 2) for each value of lambda (column 1), obtained by cross-validation. -
phi
: coefficient-specific penalty weights (only ifmethod == "plugin"
).
References
Breinlich, H., Corradi, V., Rocha, N., Ruta, M., Santos Silva, J.M.C. and T. Zylkin (2021). "Machine Learning in International Trade Research: Evaluating the Impact of Trade Agreements", Policy Research Working Paper; No. 9629. World Bank, Washington, DC.
Correia, S., P. Guimaraes and T. Zylkin (2020). "Fast Poisson estimation with high dimensional fixed effects", STATA Journal, 20, 90-115.
Gaure, S (2013). "OLS with multiple high dimensional category variables", Computational Statistics & Data Analysis, 66, 8-18.
Friedman, J., T. Hastie, and R. Tibshirani (2010). "Regularization paths for generalized linear models via coordinate descent", Journal of Statistical Software, 33, 1-22.
Belloni, A., V. Chernozhukov, C. Hansen and D. Kozbur (2016). "Inference in high dimensional panel models with an application to gun control", Journal of Business & Economic Statistics, 34, 590-605.
Examples
## Not run:
# To reduce run time, we keep only countries in the Americas:
americas <- countries$iso[countries$region == "Americas"]
# Now we can use our main functions on the reduced trade data set:
test <- mlfitppml(data = trade[, -(5:6)],
dep = "export",
fixed = list(c("exp", "time"),
c("imp", "time"),
c("exp", "imp")),
selectobs = (trade$imp %in% americas) & (trade$exp %in% americas),
lambdas = c(0.01, 0.001),
tol = 1e-6, hdfetol = 1e-2)
## End(Not run)