dataset {CVarE} | R Documentation |

Provides sample datasets M1-M7 used in the paper Conditional variance estimation for sufficient dimension reduction, Lukas Fertl, Efstathia Bura. The general model is given by:

*Y = g(B'X) + ε*

dataset(name = "M1", n = NULL, p = 20, sd = 0.5, ...)

`name` |
One of |

`n` |
number of samples. |

`p` |
Dimension of random variable |

`sd` |
standard diviation for error term |

`...` |
Additional parameters only for "M2" (namely |

List with elements

Xdata, a

*n x p*matrix.Yresponse.

Bthe dim-reduction matrix

nameName of the dataset (name parameter)

The predictors are distributed as
*X ~ N_p(0, Σ)* with
*Σ_ij = 0.5^|i - j|* for
*i, j = 1,..., p* for a subspace dimension of *k = 1* with a default
of *n = 100* data points. *p = 20*,
*b_1 = (1,1,1,1,1,1,0,...,0)' / sqrt(6)*, and *Y* is
given as

*Y = cos(b_1'X) + ε*

where *ε* is
distributed as generalized normal distribution with location 0,
shape-parameter 0.5, and the scale-parameter is chosen such that
*Var(ε) = 0.5*.

The predictors are distributed as *X ~ Z 1_p λ + N_p(0, I_p)*. with
*Z~2Binom(pmix)-1* where
*1_p* is the *p*-dimensional vector of one's, for a subspace
dimension of *k = 1* with a default of *n = 100* data points.
*p = 20*, *b_1 = (1,1,1,1,1,1,0,...,0)' / sqrt(6)*,
and *Y* is

*Y = cos(b_1'X) + 0.5ε*

where *ε* is
standard normal.
Defaults for `pmix`

is 0.3 and `lambda`

defaults to 1.

The predictors are distributed as *X~N_p(0, I_p)*
for a subspace
dimension of *k = 1* with a default of *n = 100* data points.
*p = 20*, *b_1 = (1,1,1,1,1,1,0,...,0)' / sqrt(6)*,
and *Y* is

*Y = 2 log(|b_1'X| + 2) + 0.5ε*

where *ε* is
standard normal.

The predictors are distributed as *X~N_p(0,Σ)*
with *Σ_ij = 0.5^|i - j|* for
*i, j = 1,..., p* for a subspace dimension of *k = 2* with a default
of *n = 100* data points. *p = 20*,
*b_1 = (1,1,1,1,1,1,0,...,0)' / sqrt(6)*,
*b_2 = (1,-1,1,-1,1,-1,0,...,0)' / sqrt(6)*
and *Y* is given as

*Y = (b_1'X) / (0.5 + (1.5 + b_2'X)^2) + 0.5ε*

where *ε* is standard normal.

The predictors are distributed as *X~U([0, 1]^p)*
where *U([0, 1]^p)* is the uniform distribution with
independent components on the *p*-dimensional hypercube for a subspace
dimension of *k = 2* with a default of *n = 200* data points.
*p = 20*,
*b_1 = (1,1,1,1,1,1,0,...,0)' / sqrt(6)*,
*b_2 = (1,-1,1,-1,1,-1,0,...,0)' / sqrt(6)*
and *Y* is given as

*Y = cos(π b_1'X)(b_2'X + 1)^2 + 0.5ε*

where *ε* is standard normal.

The predictors are distributed as *X~N_p(0, I_p)*
for a subspace dimension of *k = 3* with a default of *n = 200* data
point. *p = 20, b_1 = e_1, b_2 = e_2*, and *b_3 = e_p*, where
*e_j* is the *j*-th unit vector in the *p*-dimensional space.
*Y* is given as

*Y = (b_1'X)^2+(b_2'X)^2+(b_3'X)^2+0.5ε*

where *ε* is standard normal.

The predictors are distributed as *X~t_3(I_p)* where
*t_3(I_p)* is the standard multivariate t-distribution with 3 degrees of
freedom, for a subspace dimension of *k = 4* with a default of
*n = 200* data points.
*p = 20, b_1 = e_1, b_2 = e_2, b_3 = e_3*, and *b_4 = e_p*, where
*e_j* is the *j*-th unit vector in the *p*-dimensional space.
*Y* is given as

*Y = (b_1'X)(b_2'X)^2+(b_3'X)(b_4'X)+0.5ε*

where *ε* is distributed as generalized normal distribution with
location 0, shape-parameter 1, and the scale-parameter is chosen such that
*Var(ε) = 0.25*.

Fertl, L. and Bura, E. (2021) "Conditional Variance Estimation for Sufficient Dimension Reduction" <arXiv:2102.08782>

[Package *CVarE* version 1.1 Index]