mlfitppml {penppml}R Documentation

General Penalized PPML Estimation

Description

mlfitppml is a general-purpose wrapper function for penalized PPML estimation. This is a flexible tool that allows users to select:

Usage

mlfitppml(
  data,
  dep = 1,
  indep = NULL,
  fixed = NULL,
  cluster = NULL,
  selectobs = NULL,
  ...
)

Arguments

data

A data frame containing all relevant variables.

dep

A string with the name of the independent variable or a column number.

indep

A vector with the names or column numbers of the regressors. If left unspecified, all remaining variables (excluding fixed effects) are included in the regressor matrix.

fixed

A vector with the names or column numbers of factor variables identifying the fixed effects, or a list with the desired interactions between variables in data.

cluster

Optional. A string with the name of the clustering variable or a column number. It's also possible to input a vector with several variables, in which case the interaction of all of them is taken as the clustering variable.

selectobs

Optional. A vector indicating which observations to use (either a logical vector or a numeric vector with row numbers, as usual when subsetting in R).

...

Further arguments, including:

  • penalty: A string indicating the penalty type. Currently supported: "lasso" and "ridge".

  • method: The user can set this equal to "plugin" to perform the plugin algorithm with coefficient-specific penalty weights (see details). Otherwise, a single global penalty is used.

  • post: Logical. If TRUE, estimates a post-penalty regression with the selected variables.

  • xval: Logical. If TRUE, cross-validation is performed using the IDs provided in the IDs argument as folds. Note that, by default, observations are assigned individual IDs, which makes the cross-validation algorithm very time-consuming.

For a full list of options, see mlfitppml_int.

Details

This function is a thin wrapper around mlfitppml_int, providing a more convenient interface for data frames. Whereas the internal function requires some preliminary handling of data sets (y must be a vector, x must be a matrix and fes must be provided in a list), the wrapper takes a full data frame in the data argument, and users can simply specify which variables correspond to y, x and the fixed effects, using either variable names or column numbers.

For technical details on the algorithms used, see hdfeppml (post-lasso regression), penhdfeppml (standard penalized regression), penhdfeppml_cluster (plugin lasso), and xvalidate (cross-validation).

Value

A list with the following elements:

References

Breinlich, H., Corradi, V., Rocha, N., Ruta, M., Santos Silva, J.M.C. and T. Zylkin (2021). "Machine Learning in International Trade Research: Evaluating the Impact of Trade Agreements", Policy Research Working Paper; No. 9629. World Bank, Washington, DC.

Correia, S., P. Guimaraes and T. Zylkin (2020). "Fast Poisson estimation with high dimensional fixed effects", STATA Journal, 20, 90-115.

Gaure, S (2013). "OLS with multiple high dimensional category variables", Computational Statistics & Data Analysis, 66, 8-18.

Friedman, J., T. Hastie, and R. Tibshirani (2010). "Regularization paths for generalized linear models via coordinate descent", Journal of Statistical Software, 33, 1-22.

Belloni, A., V. Chernozhukov, C. Hansen and D. Kozbur (2016). "Inference in high dimensional panel models with an application to gun control", Journal of Business & Economic Statistics, 34, 590-605.

Examples

## Not run: 
# To reduce run time, we keep only countries in the Americas:
americas <- countries$iso[countries$region == "Americas"]
# Now we can use our main functions on the reduced trade data set:
test <- mlfitppml(data = trade[, -(5:6)],
                    dep = "export",
                    fixed = list(c("exp", "time"),
                                 c("imp", "time"),
                                 c("exp", "imp")),
                    selectobs = (trade$imp %in% americas) & (trade$exp %in% americas),
                    lambdas = c(0.01, 0.001),
                    tol = 1e-6, hdfetol = 1e-2)

## End(Not run)


[Package penppml version 0.2.3 Index]