R: Nonparametric additive instrumental variable estimator

naivereg {naivereg}

R Documentation

Nonparametric additive instrumental variable estimator

Description

NAIVE is the nonparametric additive instrumental variable estimator with the adaptive group Lasso. It uses group lasso and B-splines to obtain the valid instrument variables where BIC are applied to choose the tuning parameters. Then we get the two-stage least squares (2SLS) estimator with selected IV.

Usage

naivereg(
  y,
  x,
  z,
  max.degree = 10,
  intercept = TRUE,
  criterion = c("BIC", "AIC", "GCV", "AICc", "EBIC"),
  df.method = c("default", "active"),
  penalty = c("grLasso", "grMCP", "grSCAD", "gel", "cMCP"),
  endogenous.index = c(),
  IV.intercept = FALSE,
  family = c("gaussian", "binomial", "poisson")
)

Arguments

`y`	Response variable, a matrix N x 1.
`x`	The design matrix, without an intercept.
`z`	The instrument variables matrix.
`max.degree`	The upper limit value of degree of B-splines when using BIC/AIC to choose the tuning parameters, default is BIC.
`intercept`	Estimate with intercept or not, default is "TRUE".
`criterion`	The criterion by which to select the regularization parameter. One of "AIC", "BIC", "GCV", "AICc","EBIC", default is "BIC".
`df.method`	How should effective model parameters be calculated? One of: "active", which counts the number of nonzero coefficients; or "default", which uses the calculated df returned by grpreg, default is "default".
`penalty`	The penalty to be applied to the model. For group selection, one of grLasso, grMCP, or grSCAD. For bi-level selection, one of gel or cMCP, default is " grLasso".
`endogenous.index`	Specify which variables in design matrix are endogenous variables, the variable corresponds to the value 1 is endogenous variables, thevariable corresponds to the value 0 is exogenous variable, the default is all endogenous variables.
`IV.intercept`	Intercept of instrument variables, default is “FALSE”.
`family`	Either "gaussian" or "binomial", depending on the response.default is " gaussian ".

Details

Consider the following structural equation with endogenous regressorsY_{i}= x_{u}^{T}\beta+ \epsilon_{i}

To solve the endogeneity problem, instrumental variables are employed to obtain a consistent estimator of the population regression coefficient \beta. In practice, many potential instruments, including their series terms, may be recruited to approximate the optimal instrument and improve the precision of IV estimators. On the other hand, if many irrelevant instruments are contained in the reduced form equation, the approximation of the optimal instrument is generally unsatisfactory and the IV estimator is less efficient. In some cases where the dimensionality of z_{i}is even higher than the sample size, the linear IV method fails. To address these issues, the model sparsity is usually assumed and the penalized approaches can be applied to improve the efficiency of IV estimators. In this paper we propose the first-stage parsimonious predictive models and estimate optimal instruments in IV models with potentially more instruments than the sample size n.

The performance of the linear IV estimator in the finite sample is largely dependent on the validity of linearity assumption. This phenomenon motivated us to consider a more general nonlinear reduced form equation to capture as much information of x_{i}as possible using instruments z_{i} under the high-dimensional model settings. This nonparametric idea for the reduced form model is consistent with Newey (1990). We consider the following nonparametric additive reduced form model with a large number of possible instruments.

x_{il} = \mu_l+\sum_{j=1}^p f_{ij}z_{ij}+\xi_{il}

To estimate the nonparametric components above, we use B-spline basis functions by following the idea of Huang, Horowitz, and Wei (2010). Let S_{n}be the space of polynomial splines of degrees L>1 and let \phi_{k},k=1,2,…,m_{n}be normalized B-spline basis functions for S_{n}, where m_{n} is the sum of the polynomial degree L and the number of knots. Let be the \psi_{k}(z_{ij})=\phi_{k}(z_{ij})-n^{-1}\sum_{i=1}^n \phi_{k}(z_{ij})centered B-spline basis functions for the th instrument. The model can then be rewritten using an approximate linear reduced form:

x_{il} = \mu_{l}+\sum_{j=1}^pf_{ij} \sum_{k=1}^{m_{n}} (\gamma_{ij})\psi(z_{ij})+\xi_{il}

To select the significant instruments and estimate the component functions simultaneously, we consider the following penalized objective function with an adaptive group Lasso penalty (Huang, Horowitz, and Wei 2010) for each l th endogenous variable

L_{n}(\gamma_{l};\lambda_{n})=||X_{l}-U\lambda_{l}||_{2}^{2}+\lambda_{n}\sum_{j=1}^{p} \omega_{njl} ||\gamma_{jl}||_{2},where \omega_{jnl}=||\gamma_{jl}||_{2}^{-1},if ||\gamma_{jl}||_{2}>0,\omega_{jnl}=infty,if ||\gamma_{jl}||_{2}=0

By minimizing the penalized objective function with a group Lasso penalty we by minimizing the penalized objective function with a group Lasso penalty. And then we use the selected IV for \beta in the model with two-stage least squares (2SLS).

Value

An object of type naivereg which is a list with the following components:

`beta.endogenous`	The coefficient of endogenous variable.
`beta.exogenous`	The coefficient of exogenous variable.
`std.endogenous`	The standard deviation of the endogenous variables' coefficients.
`std.exogenous`	The standard deviation of the exogenous variables' coefficients.
`n`	Number of samples.
`degree`	Degree of B-splines.
`criterion`	The criterion by which to select the regularization parameter. One of "AIC", "BIC", "GCV", "AICc","EBIC"; default is "BIC".
`ind`	The index of selected instrument variables. Each row represents the instrumental variable selected for the corresponding endogenous variable. The order of the endogenous variables is from left to right in x.
`ind.b`	The index of selected instrument variables after B-splines. Each row represents the instrumental variable selected for the corresponding endogenous variable. The order of the endogenous variables is from left to right in x.
`res`	The difference between the predicted y and the actual y.
`t.endogenous`	The t-value of the endogenous variables' coefficients.
`t.exogenous`	The t-value of the exogenous variables' coefficients.
`endogenous.conf.interval.lower`	The lower bound of 95 percent confidence interval for endogenous variables.
`endogenous.conf.interval.upper`	The upper bound of 95 percent confidence interval for endogenous variables.
`exogenous.conf.interval.lower`	The lower bound of 95 percent confidence interval for exogenous variables.
`exogenous.conf.interval.upper`	The upper bound of 95 percent confidence interval for exogenous variables.

Author(s)

Qingliang Fan, KongYu He, Wei Zhong

References

Q. Fan and W. Zhong (2018), “Nonparametric Additive Instrumental Variable Estimator: A Group Shrinkage Estimation Perspective,” Journal of Business & Economic Statistics, doi: 10.1080/07350015.2016.1180991.

Caner, M. and Fan, Q. (2015), Hybrid GEL Estimators: Instrument Selection with Adaptive Lasso, Journal of Econometrics, Volume 187, 256–274.

Examples

#naive regression
library(naivereg)
data("naivedata")
x=naivedata[,1]
y=naivedata[,2]
z=naivedata[,3:22]
#estimate with intercept
naive_intercept= naivereg(y,x,z)
#estimate without intercept,criterion:AIC
naive_without_intercept = naivereg(y,x,z,intercept=FALSE,criterion='AIC')

[Package naivereg version 1.0.5 Index]