DSLIVE {naivereg} | R Documentation |
DS-LIVE
Description
Double selection plus logistic regression instrumental variable estimator (DS-LIVE). A three-step approach to estimate the dummy endogenous treatment effect using high-dimensional instruments in a penalized logistic regression model and double selection. This method accommodates the binary endogenous variable as well as the high-dimensionality for both the reduced form and structural equation models.
Usage
DSLIVE(
y,
x,
z,
D,
criterion = c("BIC", "CV"),
penalty = c("SCAD", "MCP", "lasso"),
family = c("gaussian", "binomial", "poisson", "multinomial", "cox", "mgaussian"),
alpha = 1,
gamma = 3.7,
nfolds = 10,
nlambda = 100,
...
)
Arguments
y |
Response variable, an N x 1 vector. |
x |
Control variables, an N x p1 matrix. |
z |
Instrumental variables, an N x p2 matrix. |
D |
Endogenous treatment variable, the value of endogenous variable is 0 or 1 (binary). |
criterion |
The criterion by which to select the regularization parameter. One of "BIC", "CV", CV means cross-validation, default is "BIC". |
penalty |
This parameter takes effect when the creterion is CV. Quantitative for family="gaussian", or family="poisson" (non-negative counts). For family="binomial" should be either a factor with two levels, or a two-column matrix of counts or proportions (the second column is treated as the target class; for a factor, the last level in alphabetical order is the target class). For family="multinomial", can be a nc>=2 level factor, or a matrix with nc columns of counts or proportions. For either "binomial" or "multinomial", if y is presented as a vector, it will be coerced into a factor. For family="cox", y should be a two-column matrix with columns named 'time' and 'status'. The latter is a binary variable, with '1' indicating death, and '0' indicating right censored. The function Surv() in package survival produces such a matrix. For family="mgaussian", y is a matrix of quantitative responses. |
family |
Only applied to the first step in the algorithm, the regression of y on x. Quantitative for family="gaussian", or family="poisson" (non-negative counts). For family="binomial" should be either a factor with two levels, or a two-column matrix of counts or proportions (the second column is treated as the target class; for a factor, the last level in alphabetical order is the target class). For family="multinomial", can be a nc>=2 level factor, or a matrix with nc columns of counts or proportions. For either "binomial" or "multinomial", if y is presented as a vector, it will be coerced into a factor. For family="cox", y should be a two-column matrix with columns named 'time' and 'status'. The latter is a binary variable, with '1' indicating death, and '0' indicating right censored. The function Surv() in package survival produces such a matrix. For family="mgaussian", y is a matrix of quantitative responses. |
alpha |
Tuning parameter for the Mnet estimator which controls the relative contributions from the MCP/SCAD penalty and the ridge, or L2 penalty. alpha=1 is equivalent to MCP/SCAD penalty, while alpha=0 would be equivalent to ridge regression. However, alpha=0 is not supported; alpha may be arbitrarily small, but not exactly 0. |
gamma |
The tuning parameter of the MCP/SCAD penalty. Default is 3.7. |
nfolds |
This parameter takes effect when the creterion is CV. The response number of folds - default is 10. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds=3. |
nlambda |
The number of lambda values, default is 100. |
... |
other arguments, see help(glmnet) or help(cv.ncvreg). |
Details
The DS-IV algorithm consists of the following three steps: In the first step, it estimates the coefficients (betaX) and select the important control variables set (denoted by c1) which are helpful to predict the outcome variable y using regularization methods for the data (y; x). In the second step, using a penalized logistic regression model, it selects both important control variables x (the selected control variables set is denoted by cx) and instrumental variables z for the endogenous treatment D. This step is crucial in the algorithm. Because it can estimate the optimal instrument using high-dimensional IVs as well as select additional important control variables which might be missed in the first step but are nonetheless important to the treatment variable. In the third step, it computes the post-double-selection LIVE estimator for the dummy endogenous treatment effect based on the predicted treatment variable D and the union of selected control variables in the first two variable selection steps denoted by c3 = (c1 union cx).
Value
An object of type DSLIVE
which is a list with the following
components:
betaD |
The coefficient of endogenous variable D. |
betaX |
The coefficient of control variables x. |
c1 |
Variable indication of the selected in the first step (control variables x). |
cx |
Variable indication of selected control variables in the second step. |
cz |
Variable indication of selected instrumental variables in the second step. |
c2 |
Variable indication of the selected in the second step. The number less than or equal to p1 is an indication of control variables, the number greater than p1 and less than or equal to (p1 + p2) is an indication of instrument variables. |
c3 |
Union of c1 and cx on control variables. |
family |
Same as above. |
criterion |
Same as above. |
Author(s)
Qingliang Fan, KongYu He, Wei Zhong
References
Wei Zhong, Wei Zhou, Qingliang Fan and Yang Gao (2020), “Dummy Endogenous Treatment Effect Estimation Using High-Dimensional Instrumental Variables”, working paper.
Examples
library(naivereg)
data("DSLIVEdata")
y=DSLIVEdata[,1]
x=DSLIVEdata[,2:201]
z=DSLIVEdata[,202:221]
D=DSLIVEdata[,222]
res = DSLIVE(y,x,z,D,family='gaussian', criterion='BIC')
res$c1 # Variable indication of the selected in the first step (control variables x).
res$cx # Variable indication of selected control variables in the second step.
res$cz # Variable indication of selected instrumental variables in the second step.
res$c3 # Union of c1 and cx on control variables.