mixture_sim {ICSClust} | R Documentation |
Simulation of a mixture of Gaussian distributions
Description
Simulation of a n \times p
data frame according to a mixture of q
Gaussian distributions with q < p
, different location parameters
\mu_1, \dots, \mu_q
, and the identity matrix as the covariance matrix.
Usage
mixture_sim(pct_clusters = c(0.5, 0.5), n = 500, p = 10, delta = 10)
Arguments
pct_clusters |
a vector of marginal probabilities for each group, i.e mixture weights. Default is two balanced clusters. |
n |
integer. The number of observations. |
p |
integer. The number of variables. |
delta |
integer. The location shift. |
Details
Let X
be a p
-variate real random vector distributed according to
a mixture of q
Gaussian distributions with q < p
,
different location parameters \mu_1, \dots, \mu_q
, and the same positive
definite covariance matrix I_p
:
X \sim \sum_{h=1}^{q} \epsilon_h \, {\cal N}(\mu_h,I_p),
where \epsilon_{1}, \dots, \epsilon_{q}
are mixture weights with
\epsilon_1 + \cdots + \epsilon_q = 1
, \mu_1 = 0_p
,
and \mu_{h+1} = \delta e_h
with h = 1, \dots, q-1
.
Value
A dataframe of n observations and p+1 variables with the first variable indicating the cluster assignment using a character string.
Author(s)
Aurore Archimbaud
References
Alfons, A., Archimbaud, A., Nordhausen, K., & Ruiz-Gazen, A. (2022). Tandem clustering with invariant coordinate selection. arXiv preprint arXiv:2212.06108..
Examples
X <- mixture_sim()
summary(X)