cve {CVarE} R Documentation

Conditional Variance Estimator (CVE).

Description

This is the main function in the `CVE` package. It creates objects of class `"cve"` to estimate the mean subspace. Helper functions that require a `"cve"` object can then be applied to the output from this function.

Conditional Variance Estimation (CVE) is a sufficient dimension reduction (SDR) method for regressions studying E(Y|X), the conditional expectation of a response Y given a set of predictors X. This function provides methods for estimating the dimension and the subspace spanned by the columns of a p x k matrix B of minimal rank k such that

E(Y|X) = E(Y|B'X)

or, equivalently,

Y = g(B'X) + ε

where X is independent of ε with positive definite variance-covariance matrix Var(X) = Σ_X. ε is a mean zero random variable with finite Var(ε) = E(ε^2), g is an unknown, continuous non-constant function, and B = (b_1,..., b_k) is a real p x k matrix of rank k <= p.

Both the dimension k and the subspace span(B) are unknown. The CVE method makes very few assumptions.

A kernel matrix Bhat is estimated such that the column space of Bhat should be close to the mean subspace span(B). The primary output from this method is a set of orthonormal vectors, Bhat, whose span estimates span(B).

The method central implements the Ensemble Conditional Variance Estimation (ECVE) as described in [2]. It augments the CVE method by applying an ensemble of functions (parameter `func_list`) to the response to estimate the central subspace. This corresponds to the generalization

F(Y|X) = F(Y|B'X)

or, equivalently,

Y = g(B'X, ε)

where F is the conditional cumulative distribution function.

Usage

```cve(formula, data, method = "mean", max.dim = 10L, ...)
```

Arguments

 `formula` an object of class `"formula"` which is a symbolic description of the model to be fitted like Y ~ X where Y is a n-dimensional vector of the response variable and X is a n x p matrix of the predictors. `data` an optional data frame, containing the data for the formula if supplied like `data <- data.frame(Y, X)` with dimension n x (p + 1). By default the variables are taken from the environment from which `cve` is called. `method` This character string specifies the method of fitting. The options are `"mean"` method to estimate the mean subspace, see [1]. `"central"` ensemble method to estimate the central subspace, see [2]. `"weighted.mean"` variation of `"mean"` method with adaptive weighting of slices, see [1]. `"weighted.central"` variation of `"central"` method with adaptive weighting of slices, see [2]. `max.dim` upper bounds for `k`, (ignored if `k` is supplied). `...` optional parameters passed on to `cve.call`.

Value

an S3 object of class `cve` with components:

X

design matrix of predictor vector used for calculating cve-estimate,

Y

n-dimensional vector of responses used for calculating cve-estimate,

method

Name of used method,

call

the matched call,

res

list of components `V, L, B, loss, h` for each `k = min.dim, ..., max.dim`. If `k` was supplied in the call `min.dim = max.dim = k`.

• `B` is the cve-estimate with dimension p x k.

• `V` is the orthogonal complement of B.

• `L` is the loss for each sample seperatels such that it's mean is `loss`.

• `loss` is the value of the target function that is minimized, evaluated at V.

• `h` bandwidth parameter used to calculate `B, V, loss, L`.

References

[1] Fertl, L. and Bura, E. (2021) "Conditional Variance Estimation for Sufficient Dimension Reduction" <arXiv:2102.08782>

[2] Fertl, L. and Bura, E. (2021) "Ensemble Conditional Variance Estimation for Sufficient Dimension Reduction" <arXiv:2102.13435>

For a detailed description of `formula` see `formula`.

Examples

```# set dimensions for simulation model
p <- 5
k <- 2
# create B for simulation
b1 <- rep(1 / sqrt(p), p)
b2 <- (-1)^seq(1, p) / sqrt(p)
B <- cbind(b1, b2)
# sample size
n <- 100
set.seed(21)

# creat predictor data x ~ N(0, I_p)
x <- matrix(rnorm(n * p), n, p)
# simulate response variable
#     y = f(B'x) + err
# with f(x1, x2) = x1^2 + 2 * x2 and err ~ N(0, 0.25^2)
y <- (x %*% b1)^2 + 2 * (x %*% b2) + 0.25 * rnorm(n)

# calculate cve with method 'mean' for k unknown in 1, ..., 3
cve.obj.s <- cve(y ~ x, max.dim = 2) # default method 'mean'
# calculate cve with method 'weighed' for k = 2
cve.obj.w <- cve(y ~ x, k = 2, method = 'weighted.mean')
B2 <- coef(cve.obj.s, k = 2)

# get projected X data (same as cve.obj.s\$X %*% B2)
proj.X <- directions(cve.obj.s, k = 2)
#  plot y against projected data
plot(proj.X[, 1], y)
plot(proj.X[, 2], y)

# creat 10 new x points and y according to model
x.new <- matrix(rnorm(10 * p), 10, p)
y.new <- (x.new %*% b1)^2 + 2 * (x.new %*% b2) + 0.25 * rnorm(10)
# predict y.new
yhat <- predict(cve.obj.s, x.new, 2)
plot(y.new, yhat)

# projection matrix on span(B)
# same as B %*% t(B) since B is semi-orthogonal
PB <- B %*% solve(t(B) %*% B) %*% t(B)
# cve estimates for B with mean and weighted method
B.s <- coef(cve.obj.s, k = 2)
B.w <- coef(cve.obj.w, k = 2)
# same as B.s %*% t(B.s) since B.s is semi-orthogonal (same vor B.w)
PB.s <- B.s %*% solve(t(B.s) %*% B.s) %*% t(B.s)
PB.w <- B.w %*% solve(t(B.w) %*% B.w) %*% t(B.w)
# compare estimation accuracy of mean and weighted cve estimate by
# Frobenius norm of difference of projections.
norm(PB - PB.s, type = 'F')
norm(PB - PB.w, type = 'F')

```

[Package CVarE version 1.1 Index]