mixture_sim {ICSClust}R Documentation

Simulation of a mixture of Gaussian distributions

Description

Simulation of a n \times p data frame according to a mixture of q Gaussian distributions with q < p, different location parameters \mu_1, \dots, \mu_q, and the identity matrix as the covariance matrix.

Usage

mixture_sim(pct_clusters = c(0.5, 0.5), n = 500, p = 10, delta = 10)

Arguments

pct_clusters

a vector of marginal probabilities for each group, i.e mixture weights. Default is two balanced clusters.

n

integer. The number of observations.

p

integer. The number of variables.

delta

integer. The location shift.

Details

Let X be a p-variate real random vector distributed according to a mixture of q Gaussian distributions with q < p, different location parameters \mu_1, \dots, \mu_q, and the same positive definite covariance matrix I_p:

X \sim \sum_{h=1}^{q} \epsilon_h \, {\cal N}(\mu_h,I_p),

where \epsilon_{1}, \dots, \epsilon_{q} are mixture weights with \epsilon_1 + \cdots + \epsilon_q = 1, \mu_1 = 0_p, and \mu_{h+1} = \delta e_h with h = 1, \dots, q-1.

Value

A dataframe of n observations and p+1 variables with the first variable indicating the cluster assignment using a character string.

Author(s)

Aurore Archimbaud

References

Alfons, A., Archimbaud, A., Nordhausen, K., & Ruiz-Gazen, A. (2022). Tandem clustering with invariant coordinate selection. arXiv preprint arXiv:2212.06108..

Examples

X <- mixture_sim()
summary(X)

[Package ICSClust version 0.1.0 Index]