R: DS-LIVE

DSLIVE {naivereg}

R Documentation

DS-LIVE

Description

Double selection plus logistic regression instrumental variable estimator (DS-LIVE). A three-step approach to estimate the dummy endogenous treatment effect using high-dimensional instruments in a penalized logistic regression model and double selection. This method accommodates the binary endogenous variable as well as the high-dimensionality for both the reduced form and structural equation models.

Usage

DSLIVE(
  y,
  x,
  z,
  D,
  criterion = c("BIC", "CV"),
  penalty = c("SCAD", "MCP", "lasso"),
  family = c("gaussian", "binomial", "poisson", "multinomial", "cox", "mgaussian"),
  alpha = 1,
  gamma = 3.7,
  nfolds = 10,
  nlambda = 100,
  ...
)

Arguments

`y`	Response variable, an N x 1 vector.
`x`	Control variables, an N x p1 matrix.
`z`	Instrumental variables, an N x p2 matrix.
`D`	Endogenous treatment variable, the value of endogenous variable is 0 or 1 (binary).
`criterion`	The criterion by which to select the regularization parameter. One of "BIC", "CV", CV means cross-validation, default is "BIC".
`penalty`	This parameter takes effect when the creterion is CV. Quantitative for family="gaussian", or family="poisson" (non-negative counts). For family="binomial" should be either a factor with two levels, or a two-column matrix of counts or proportions (the second column is treated as the target class; for a factor, the last level in alphabetical order is the target class). For family="multinomial", can be a nc>=2 level factor, or a matrix with nc columns of counts or proportions. For either "binomial" or "multinomial", if y is presented as a vector, it will be coerced into a factor. For family="cox", y should be a two-column matrix with columns named 'time' and 'status'. The latter is a binary variable, with '1' indicating death, and '0' indicating right censored. The function Surv() in package survival produces such a matrix. For family="mgaussian", y is a matrix of quantitative responses.
`family`	Only applied to the first step in the algorithm, the regression of y on x. Quantitative for family="gaussian", or family="poisson" (non-negative counts). For family="binomial" should be either a factor with two levels, or a two-column matrix of counts or proportions (the second column is treated as the target class; for a factor, the last level in alphabetical order is the target class). For family="multinomial", can be a nc>=2 level factor, or a matrix with nc columns of counts or proportions. For either "binomial" or "multinomial", if y is presented as a vector, it will be coerced into a factor. For family="cox", y should be a two-column matrix with columns named 'time' and 'status'. The latter is a binary variable, with '1' indicating death, and '0' indicating right censored. The function Surv() in package survival produces such a matrix. For family="mgaussian", y is a matrix of quantitative responses.
`alpha`	Tuning parameter for the Mnet estimator which controls the relative contributions from the MCP/SCAD penalty and the ridge, or L2 penalty. alpha=1 is equivalent to MCP/SCAD penalty, while alpha=0 would be equivalent to ridge regression. However, alpha=0 is not supported; alpha may be arbitrarily small, but not exactly 0.
`gamma`	The tuning parameter of the MCP/SCAD penalty. Default is 3.7.
`nfolds`	This parameter takes effect when the creterion is CV. The response number of folds - default is 10. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds=3.
`nlambda`	The number of lambda values, default is 100.
`...`	other arguments, see help(glmnet) or help(cv.ncvreg).

Details

The DS-IV algorithm consists of the following three steps: In the first step, it estimates the coefficients (betaX) and select the important control variables set (denoted by c1) which are helpful to predict the outcome variable y using regularization methods for the data (y; x). In the second step, using a penalized logistic regression model, it selects both important control variables x (the selected control variables set is denoted by cx) and instrumental variables z for the endogenous treatment D. This step is crucial in the algorithm. Because it can estimate the optimal instrument using high-dimensional IVs as well as select additional important control variables which might be missed in the first step but are nonetheless important to the treatment variable. In the third step, it computes the post-double-selection LIVE estimator for the dummy endogenous treatment effect based on the predicted treatment variable D and the union of selected control variables in the first two variable selection steps denoted by c3 = (c1 union cx).

Value

An object of type DSLIVE which is a list with the following components:

`betaD`	The coefficient of endogenous variable D.
`betaX`	The coefficient of control variables x.
`c1`	Variable indication of the selected in the first step (control variables x).
`cx`	Variable indication of selected control variables in the second step.
`cz`	Variable indication of selected instrumental variables in the second step.
`c2`	Variable indication of the selected in the second step. The number less than or equal to p1 is an indication of control variables, the number greater than p1 and less than or equal to (p1 + p2) is an indication of instrument variables.
`c3`	Union of c1 and cx on control variables.
`family`	Same as above.
`criterion`	Same as above.

Author(s)

Qingliang Fan, KongYu He, Wei Zhong

References

Wei Zhong, Wei Zhou, Qingliang Fan and Yang Gao (2020), “Dummy Endogenous Treatment Effect Estimation Using High-Dimensional Instrumental Variables”, working paper.

Examples

library(naivereg)
data("DSLIVEdata")
y=DSLIVEdata[,1]
x=DSLIVEdata[,2:201]
z=DSLIVEdata[,202:221]
D=DSLIVEdata[,222]
res = DSLIVE(y,x,z,D,family='gaussian', criterion='BIC')
res$c1 # Variable indication of the selected in the first step (control variables x).
res$cx # Variable indication of selected control variables in the second step.
res$cz # Variable indication of selected instrumental variables in the second step.
res$c3 # Union of c1 and cx on control variables.

[Package naivereg version 1.0.5 Index]