dataset {CVarE} R Documentation

## Generates test datasets.

### Description

Provides sample datasets M1-M7 used in the paper Conditional variance estimation for sufficient dimension reduction, Lukas Fertl, Efstathia Bura. The general model is given by:

Y = g(B'X) + ε

### Usage

```dataset(name = "M1", n = NULL, p = 20, sd = 0.5, ...)
```

### Arguments

 `name` One of `"M1"`, `"M2"`, `"M3"`, `"M4",` `"M5"`, `"M6"` or `"M7"`. Alternative just the dataset number 1-7. `n` number of samples. `p` Dimension of random variable X. `sd` standard diviation for error term ε. `...` Additional parameters only for "M2" (namely `pmix` and `lambda`), see: below.

### Value

List with elements

• Xdata, a n x p matrix.

• Yresponse.

• Bthe dim-reduction matrix

• nameName of the dataset (name parameter)

### M1

The predictors are distributed as X ~ N_p(0, Σ) with Σ_ij = 0.5^|i - j| for i, j = 1,..., p for a subspace dimension of k = 1 with a default of n = 100 data points. p = 20, b_1 = (1,1,1,1,1,1,0,...,0)' / sqrt(6), and Y is given as

Y = cos(b_1'X) + ε

where ε is distributed as generalized normal distribution with location 0, shape-parameter 0.5, and the scale-parameter is chosen such that Var(ε) = 0.5.

### M2

The predictors are distributed as X ~ Z 1_p λ + N_p(0, I_p). with Z~2Binom(pmix)-1 where 1_p is the p-dimensional vector of one's, for a subspace dimension of k = 1 with a default of n = 100 data points. p = 20, b_1 = (1,1,1,1,1,1,0,...,0)' / sqrt(6), and Y is

Y = cos(b_1'X) + 0.5ε

where ε is standard normal. Defaults for `pmix` is 0.3 and `lambda` defaults to 1.

### M3

The predictors are distributed as X~N_p(0, I_p) for a subspace dimension of k = 1 with a default of n = 100 data points. p = 20, b_1 = (1,1,1,1,1,1,0,...,0)' / sqrt(6), and Y is

Y = 2 log(|b_1'X| + 2) + 0.5ε

where ε is standard normal.

### M4

The predictors are distributed as X~N_p(0,Σ) with Σ_ij = 0.5^|i - j| for i, j = 1,..., p for a subspace dimension of k = 2 with a default of n = 100 data points. p = 20, b_1 = (1,1,1,1,1,1,0,...,0)' / sqrt(6), b_2 = (1,-1,1,-1,1,-1,0,...,0)' / sqrt(6) and Y is given as

Y = (b_1'X) / (0.5 + (1.5 + b_2'X)^2) + 0.5ε

where ε is standard normal.

### M5

The predictors are distributed as X~U([0, 1]^p) where U([0, 1]^p) is the uniform distribution with independent components on the p-dimensional hypercube for a subspace dimension of k = 2 with a default of n = 200 data points. p = 20, b_1 = (1,1,1,1,1,1,0,...,0)' / sqrt(6), b_2 = (1,-1,1,-1,1,-1,0,...,0)' / sqrt(6) and Y is given as

Y = cos(π b_1'X)(b_2'X + 1)^2 + 0.5ε

where ε is standard normal.

### M6

The predictors are distributed as X~N_p(0, I_p) for a subspace dimension of k = 3 with a default of n = 200 data point. p = 20, b_1 = e_1, b_2 = e_2, and b_3 = e_p, where e_j is the j-th unit vector in the p-dimensional space. Y is given as

Y = (b_1'X)^2+(b_2'X)^2+(b_3'X)^2+0.5ε

where ε is standard normal.

### M7

The predictors are distributed as X~t_3(I_p) where t_3(I_p) is the standard multivariate t-distribution with 3 degrees of freedom, for a subspace dimension of k = 4 with a default of n = 200 data points. p = 20, b_1 = e_1, b_2 = e_2, b_3 = e_3, and b_4 = e_p, where e_j is the j-th unit vector in the p-dimensional space. Y is given as

Y = (b_1'X)(b_2'X)^2+(b_3'X)(b_4'X)+0.5ε

where ε is distributed as generalized normal distribution with location 0, shape-parameter 1, and the scale-parameter is chosen such that Var(ε) = 0.25.

### References

Fertl, L. and Bura, E. (2021) "Conditional Variance Estimation for Sufficient Dimension Reduction" <arXiv:2102.08782>

[Package CVarE version 1.1 Index]