generate_intraclass {sparsediscrim} | R Documentation |
Generates data from K
multivariate normal data populations, where each
population (class) has an intraclass covariance matrix.
Description
This function generates K
multivariate normal data sets, where each
class is generated with a constant mean vector and an intraclass covariance
matrix. The data are returned as a single matrix x
along with a vector
of class labels y
that indicates class membership.
Usage
generate_intraclass(n, p, rho, mu, sigma2 = rep(1, K))
Arguments
n |
vector of the sample sizes of each class. The length of |
p |
the number of features (variables) in the data |
rho |
vector of the values of the off-diagonal elements for each
intraclass covariance matrix. Must equal the length of |
mu |
vector containing the mean for each class. Must equal the length of
|
sigma2 |
vector of variances for each class. Must equal the length of
|
Details
For simplicity, we assume that a class mean vector is constant for each
feature. That is, we assume that the mean vector of the k
th class is
c_k * j_p
, where j_p
is a p \times 1
vector of ones and
c_k
is a real scalar.
The intraclass covariance matrix for the k
th class is defined as:
\sigma_k^2 * (\rho_k * J_p + (1 - \rho_k) * I_p),
where J_p
is the p \times p
matrix of ones and I_p
is the
p \times p
identity matrix.
By default, with \sigma_k^2 = 1
, the diagonal elements of the intraclass
covariance matrix are all 1, while the off-diagonal elements of the matrix
are all rho
.
The values of rho
must be between 1 / (1 - p)
and 1,
exclusively, to ensure that the covariance matrix is positive definite.
The number of classes K
is determined with lazy evaluation as the
length of n
.
Value
named list with elements:
-
x
: matrix of observations withn
rows andp
columns -
y
: vector of class labels that indicates class membership for each observation (row) inx
.
Examples
# Generates data from K = 3 classes.
data <- generate_intraclass(n = 3:5, p = 5, rho = seq(.1, .9, length = 3),
mu = c(0, 3, -2))
data$x
data$y
# Generates data from K = 4 classes. Notice that we use specify a variance.
data <- generate_intraclass(n = 3:6, p = 4, rho = seq(0, .9, length = 4),
mu = c(0, 3, -2, 6), sigma2 = 1:4)
data$x
data$y