cve.call {CVarE} R Documentation

## Conditional Variance Estimator (CVE).

### Description

This is the main function in the `CVE` package. It creates objects of class `"cve"` to estimate the mean subspace. Helper functions that require a `"cve"` object can then be applied to the output from this function.

Conditional Variance Estimation (CVE) is a sufficient dimension reduction (SDR) method for regressions studying E(Y|X), the conditional expectation of a response Y given a set of predictors X. This function provides methods for estimating the dimension and the subspace spanned by the columns of a p x k matrix B of minimal rank k such that

E(Y|X) = E(Y|B'X)

or, equivalently,

Y = g(B'X) + ε

where X is independent of ε with positive definite variance-covariance matrix Var(X) = Σ_X. ε is a mean zero random variable with finite Var(ε) = E(ε^2), g is an unknown, continuous non-constant function, and B = (b_1,..., b_k) is a real p x k matrix of rank k <= p.

Both the dimension k and the subspace span(B) are unknown. The CVE method makes very few assumptions.

A kernel matrix Bhat is estimated such that the column space of Bhat should be close to the mean subspace span(B). The primary output from this method is a set of orthonormal vectors, Bhat, whose span estimates span(B).

The method central implements the Ensemble Conditional Variance Estimation (ECVE) as described in . It augments the CVE method by applying an ensemble of functions (parameter `func_list`) to the response to estimate the central subspace. This corresponds to the generalization

F(Y|X) = F(Y|B'X)

or, equivalently,

Y = g(B'X, ε)

where F is the conditional cumulative distribution function.

### Usage

```cve.call(
X,
Y,
method = c("mean", "weighted.mean", "central", "weighted.central"),
func_list = NULL,
nObs = sqrt(nrow(X)),
h = NULL,
min.dim = 1L,
max.dim = 10L,
k = NULL,
momentum = 0,
tau = 1,
tol = 0.001,
slack = 0,
gamma = 0.5,
V.init = NULL,
max.iter = 50L,
attempts = 10L,
nr.proj = 1L,
logger = NULL
)
```

### Arguments

 `X` Design predictor matrix. `Y` n-dimensional vector of responses. `method` This character string specifies the method of fitting. The options are `"mean"` method to estimate the mean subspace, see . `"central"` ensemble method to estimate the central subspace, see . `"weighted.mean"` variation of `"mean"` method with adaptive weighting of slices, see . `"weighted.central"` variation of `"central"` method with adaptive weighting of slices, see . `func_list` a list of functions applied to `Y` used by ECVE (see ) for central subspace estimation. The default ensemble are indicator functions of the [0, 10], (10, 20], ..., (90, 100] percent response quantiles. (only relevant if `method` is `"central"` or `"weighted.central"`, ignored otherwise) `nObs` parameter for choosing bandwidth `h` using `estimate.bandwidth` (ignored if `h` is supplied). `h` bandwidth or function to estimate bandwidth, defaults to internaly estimated bandwidth. `min.dim` lower bounds for `k`, (ignored if `k` is supplied). `max.dim` upper bounds for `k`, (ignored if `k` is supplied). `k` Dimension of lower dimensional projection, if `k` is given only the specified dimension `B` matrix is estimated. `momentum` number of [0, 1) giving the ration of momentum for eucledian gradient update with a momentum term. `momentum = 0` corresponds to normal gradient descend. `tau` Initial step-size. `tol` Tolerance for break condition. `slack` Positive scaling to allow small increases of the loss while optimizing, i.e. `slack = 0.1` allows the target function to increase up to 10 \% in one optimization step. `gamma` step-size reduction multiple. If gradient step with step size `tau` is not accepted `gamma * tau` is set to the next step size. `V.init` Semi-orthogonal matrix of dimensions '(ncol(X), ncol(X) - k) used as starting value in the optimization. (If supplied, `attempts` is set to 0 and `k` to match dimension). `max.iter` maximum number of optimization steps. `attempts` If `V.init` not supplied, the optimization is carried out `attempts` times with starting values drawn from the invariant measure on the Stiefel manifold (see `rStiefel`). `nr.proj` The number of projection used for projective resampling for multivariate response Y (under active development, ignored for univariate response). `logger` a logger function (only for advanced users, slows down the computation).

### Value

an S3 object of class `cve` with components:

X

design matrix of predictor vector used for calculating cve-estimate,

Y

n-dimensional vector of responses used for calculating cve-estimate,

method

Name of used method,

call

the matched call,

res

list of components `V, L, B, loss, h` for each `k = min.dim, ..., max.dim`. If `k` was supplied in the call `min.dim = max.dim = k`.

• `B` is the cve-estimate with dimension p x k.

• `V` is the orthogonal complement of B.

• `L` is the loss for each sample seperatels such that it's mean is `loss`.

• `loss` is the value of the target function that is minimized, evaluated at V.

• `h` bandwidth parameter used to calculate `B, V, loss, L`.

### References

 Fertl, L. and Bura, E. (2021) "Conditional Variance Estimation for Sufficient Dimension Reduction" <arXiv:2102.08782>

 Fertl, L. and Bura, E. (2021) "Ensemble Conditional Variance Estimation for Sufficient Dimension Reduction" <arXiv:2102.13435>

### Examples

```# create B for simulation (k = 1)
B <- rep(1, 5) / sqrt(5)

set.seed(21)
# creat predictor data X ~ N(0, I_p)
X <- matrix(rnorm(500), 100, 5)
# simulate response variable
#     Y = f(B'X) + err
# with f(x1) = x1 and err ~ N(0, 0.25^2)
Y <- X %*% B + 0.25 * rnorm(100)

# calculate cve with method 'simple' for k = 1
set.seed(21)
cve.obj.simple1 <- cve(Y ~ X, k = 1)

# same as
set.seed(21)
cve.obj.simple2 <- cve.call(X, Y, k = 1)

# extract estimated B's.
coef(cve.obj.simple1, k = 1)
coef(cve.obj.simple2, k = 1)
```

[Package CVarE version 1.1 Index]