cov.sel {CovSel} | R Documentation |

Dimension reduction of the covariate vector under unconfoundedness using model-free backward elimination algorithms, based on either marginal co-ordinate hypothesis testing, (MCH), (continuous covariates only) or kernel-based smoothing, (KS).

```
cov.sel(T, Y, X, type=c("dr", "np"), alg = 3,scope = NULL, alpha = 0.1,
thru=0.5,thro=0.25,thrc=100,...)
```

`T` |
A vector, containing |

`Y` |
A vector of observed outcomes. |

`X` |
A matrix or data frame containing columns of covariates. The covariates may be a mix of continuous, unordered discrete
(to be specified in the data frame using |

`type` |
The type of method used. |

`alg` |
Specifying which algorithm to be use. |

`scope` |
A character string giving the name of one (or several) covariate(s) that must not be removed. |

`alpha` |
Stopping criterion for MCH: will stop removing covariates
when the p-value for the next covariate to be removed is less
then |

`thru` |
Bandwidth threshold used for unordered discrete covariates if |

`thro` |
Bandwidth threshold used for ordered discrete covariates if |

`thrc` |
Bandwidth threshold used for continuous covariates if |

`...` |
Additional arguments passed on to |

Performs model-free selection of covariates for situations where the parameter of interest is an average causal effect. This function is based on the framework of sufficient dimension reduction, that under unconfoundedness, reduces dimension of the covariate vector. A two-step procedure searching for a sufficient subset of the covariate vector is implemented in the form of algorithms. This function uses MCH (if `type="dr"`

) or KS (if `type="np"`

) in the form of two backward elimination algorithms, Algorithm A and Algorithm B proposed by de Luna, Waernbaum and Richardson (2011).

Algorithm A (`alg = 1`

): First the covariates conditionally independent of the treatment, `T`

, given the rest of the variables (`X.T`

) are removed. Then the covariates conditionally independent of the potential outcomes (in each of the treatment groups) given the rest of the covariates are removed. This yields two subsets of covariates; `Q.1`

and `Q.0`

for the treatment and control group respectively.

Algorithm B (`alg = 2`

): First the covariates conditionally independent of the potential outcome (in each of the treatment groups), given the rest of the covariates (`X.0`

and `X.1`

) are removed. Then the covariates conditionally independent of the treatment, `T`

, given the rest of the covariates are removed. This yields two subsets of covariates; `Z.1`

and `Z.0`

for the treatment and control group respectively.

`alg=3`

runs both Algorithm A and B.

In KS the bandwidth range for unordered discrete covariates is [0, 1/#levels] while for ordered discrete covariates, no matter how many levels, the range is [0, 1]. For continuous covariates bandwidths ranges from 0 to infinity. Ordered discrete and continuous covariates are removed if their bandwidths exceed their respective thresholds. Unordered discrete covariates are removed if their bandwidths are larger than `thru`

times the maximum bandwidth.

In case of MCH one can choose between sliced inverse regression, SIR, or sliced average variance estimation, SAVE. For KS the regression type can be set to local constant kernel or local linear and the bandwidth type can be set to fixed, generalized nearest neighbors or adaptive nearest neighbors. See `dr`

and `npregbw`

for details. Since `type="np"`

results in a fully nonparametric covariate selection procedure this can be much slower than if `type="dr"`

.

`cov.sel`

returns a list with the following content:

`X.T` |
The of covariates with minimum cardinality such that |

`Q.0` |
The set of covariates with minimum cardinality such that |

`Q.1` |
The set of covariates with minimum cardinality such that |

`X.0` |
The set of covariates with minimum cardinality such that |

`X.1` |
The set of covariates with minimum cardinality such that |

`Z.0` |
The set of covariates with minimum cardinality such that |

`Z.1` |
The set of covariates with minimum cardinality such that |

If `type="dr"`

the following `type`

-specific content is returned:

`evectorsQ.0` |
The eigenvectors of the matrix whose columns span the reduced subspace |

`evectorsQ.1` |
The eigenvectors of the matrix whose columns span the reduced subspace |

`evectorsZ.0` |
The eigenvectors of the matrix whose columns span the reduced subspace |

`evectorsZ.1` |
The eigenvectors of the matrix whose columns span the reduced subspace |

`method` |
The method used, either |

If `type="np"`

the following `type`

-specific content is returned:

`bandwidthsQ.0` |
The selected bandwidths for the covariates in the reduced subspace |

`bandwidthsQ.1` |
The selected bandwidths for the covariates in the reduced subspace |

`bandwidthsZ.0` |
The selected bandwidths for the covariates in the reduced subspace |

`bandwidthsZ.1` |
The selected bandwidths for the covariates in the reduced subspace |

`regtype` |
The regression method used, either |

`bwtype` |
Type of bandwidth used, |

`covar` |
Names of all covariates given as input |

For marginal co-ordinate hypothesis test, `type="dr"`

, as a side effect a data frame of labels, tests, and p.values
is printed.

`cov.sel`

calls the functions `dr`

,
`dr.step`

and `npregbw`

so the packages `dr`

and `np`

are required.

Emma Persson, <emma.persson@umu.se>, Jenny Häggström, <jenny.haggstrom@umu.se>

Cook, R. D. (2004). Testing Predictor
contributions in Sufficient Dimension Reduction. *The Annals of statistics 32*. 1061-1092

de Luna, X., I. Waernbaum, and T. S. Richardson (2011). Covariate selection for the nonparametric estimation of an average treatment effect. *Biometrika 98*. 861-875

Häggström, J., E. Persson, I. Waernbaum and X. de Luna (2015). An `R`

Package for Covariate Selection When Estimating Average Causal Effects. *Journal of Statistical Software 68*. 1-20

Hall, P., Q. Li and J.S. Racine (2007). Nonparametric estimation of regression functions in the presence of irrelevant regressors. *The Review of Economics and Statistics, 89*. 784-789

Li, L., R. D. Cook, and C. J. Nachtsheim (2005). Model-free
Variable Selection. *Journal of the Royal
Statistical Society, Series B 67*. 285-299

```
## Marginal co-ordinate hypothesis test, continuous covariates only
data(datc)
##Algorithm A, keeping x6 and x7
ans <- cov.sel(T = datc$T, Y = datc$y, X = datc[,1:8], type="dr",
alpha = 0.1, alg = 1, scope=c("x6","x7"))
summary(ans)
##Algorithm B, method "save"
ans <- cov.sel(T = datc$T, Y = datc$y, X = datc[,1:10], type="dr",
alg = 2, method = "save", alpha = 0.3, na.action = "na.omit")
## Kernel-based smoothing, both categorical and continuous covariates
data(datfc)
##The example below with default setting takes about 9 minutes to run.
## ans <- cov.sel(T = datfc$T, Y = datfc$y, X = datfc[,1:8], type="np",
## alpha = 0.1, alg = 3, scope=NULL, thru=0.5, thro=0.25, thrc=100)
## For illustration purposes we run Algorithm A using only the first 100 observations
##and x1, x2, x3, x4 in datfc
ans <- cov.sel(T = datfc$T[1:100], Y = datfc$y[1:100], X = datfc[1:100,1:4],
type="np",alpha = 0.1, alg = 1, scope=NULL, thru=0.5, thro=0.25, thrc=100)
##The example below running Algorithm A, keeping x6 and x7 with regtype="ll"
##takes about 7 minutes to run.
##ans <- cov.sel(T = datfc$T, Y = datfc$y, X = datfc[,1:8], type="np",
## alpha = 0.1, alg = 3, scope=c("x6","x7"), thru=0.5, thro=0.25,
## thrc=100, regtype="ll")
```

[Package *CovSel* version 1.2.1 Index]