hdfeppml {penppml} | R Documentation |
PPML Estimation with HDFE
Description
hdfeppml
fits an (unpenalized) Poisson Pseudo Maximum Likelihood (PPML) model with
high-dimensional fixed effects (HDFE).
Usage
hdfeppml(
data,
dep = 1,
indep = NULL,
fixed = NULL,
cluster = NULL,
selectobs = NULL,
...
)
Arguments
data |
A data frame containing all relevant variables. |
dep |
A string with the name of the independent variable or a column number. |
indep |
A vector with the names or column numbers of the regressors. If left unspecified, all remaining variables (excluding fixed effects) are included in the regressor matrix. |
fixed |
A vector with the names or column numbers of factor variables identifying the fixed effects,
or a list with the desired interactions between variables in |
cluster |
Optional. A string with the name of the clustering variable or a column number. It's also possible to input a vector with several variables, in which case the interaction of all of them is taken as the clustering variable. |
selectobs |
Optional. A vector indicating which observations to use (either a logical vector or a numeric vector with row numbers, as usual when subsetting in R). |
... |
Further options. For a full list, see hdfeppml_int. |
Details
This function is a thin wrapper around hdfeppml_int, providing a more convenient interface for
data frames. Whereas the internal function requires some preliminary handling of data sets (y
must be a vector, x
must be a matrix and fixed effects fes
must be provided in a list),
the wrapper takes a full data frame in the data
argument, and users can simply specify which
variables correspond to y, x and the fixed effects, using either variable names or column numbers.
More formally, hdfeppml_int
performs iteratively re-weighted least squares (IRLS) on a
transformed model, as described in Correia, GuimarĂ£es and Zylkin (2020) and similar to the
ppmlhdfe
package in Stata. In each iteration, the function calculates the transformed dependent
variable, partials out the fixed effects (calling collapse:fhdwithin
) and then solves a weighted
least squares problem (using fast C++ implementation).
Value
A list with the following elements:
-
coefficients
: a 1 xncol(x)
matrix with coefficient (beta) estimates. -
residuals
: a 1 xlength(y)
matrix with the residuals of the model. -
mu
: a 1 xlength(y)
matrix with the final values of the conditional mean\mu
. -
deviance
: -
bic
: Bayesian Information Criterion. -
x_resid
: matrix of demeaned regressors. -
z_resid
: vector of demeaned (transformed) dependent variable. -
se
: standard errors of the coefficients.
References
Breinlich, H., Corradi, V., Rocha, N., Ruta, M., Santos Silva, J.M.C. and T. Zylkin (2021). "Machine Learning in International Trade Research: Evaluating the Impact of Trade Agreements", Policy Research Working Paper; No. 9629. World Bank, Washington, DC.
Correia, S., P. Guimaraes and T. Zylkin (2020). "Fast Poisson estimation with high dimensional fixed effects", STATA Journal, 20, 90-115.
Gaure, S (2013). "OLS with multiple high dimensional category variables", Computational Statistics & Data Analysis, 66, 8-18.
Friedman, J., T. Hastie, and R. Tibshirani (2010). "Regularization paths for generalized linear models via coordinate descent", Journal of Statistical Software, 33, 1-22.
Belloni, A., V. Chernozhukov, C. Hansen and D. Kozbur (2016). "Inference in high dimensional panel models with an application to gun control", Journal of Business & Economic Statistics, 34, 590-605.
Examples
## Not run:
# To reduce run time, we keep only countries in the Americas:
americas <- countries$iso[countries$region == "Americas"]
test <- hdfeppml(data = trade[, -(5:6)],
dep = "export",
fixed = list(c("exp", "time"),
c("imp", "time"),
c("exp", "imp")),
selectobs = (trade$imp %in% americas) & (trade$exp %in% americas))
## End(Not run)