rGMM {MGMM} | R Documentation |
Generate Data from Gaussian Mixture Models
Description
Generates an n\times d
matrix of multivariate normal random vectors
with observations (examples) as rows. If k=1
, all observations belong to the same
cluster. If k>1
the observations are generated via two-step procedure.
First, the cluster membership is drawn from a multinomial distribution, with
mixture proportions specified by pi
. Conditional on cluster
membership, the observation is drawn from a multivariate normal distribution,
with cluster-specific mean and covariance. The cluster means are provided
using means
, and the cluster covariance matrices are provided using
covs
. If miss>0
, missingness is introduced, completely at random, by
setting that proportion of elements in the data matrix to NA
.
Usage
rGMM(n, d = 2, k = 1, pi = NULL, miss = 0, means = NULL, covs = NULL)
Arguments
n |
Observations (rows). |
d |
Observation dimension (columns). |
k |
Number of mixture components. Defaults to 1. |
pi |
Mixture proportions. If omitted, components are assumed equiprobable. |
miss |
Proportion of elements missing, |
means |
Either a prototype mean vector, or a list of mean vectors. Defaults to the zero vector. |
covs |
Either a prototype covariance matrix, or a list of covariance matrices. Defaults to the identity matrix. |
Value
Numeric matrix with observations as rows. Row numbers specify the true cluster assignments.
See Also
For estimation, see FitGMM
.
Examples
set.seed(100)
# Single component without missingness.
# Bivariate normal observations.
cov <- matrix(c(1, 0.5, 0.5, 1), nrow = 2)
data <- rGMM(n = 1e3, d = 2, k = 1, means = c(2, 2), covs = cov)
# Single component with missingness.
# Trivariate normal observations.
mean_list <- list(c(-2, -2, -2), c(2, 2, 2))
cov <- matrix(c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3)
data <- rGMM(n = 1e3, d = 3, k = 2, means = mean_list, covs = cov)
# Two components without missingness.
# Trivariate normal observations.
mean_list <- list(c(-2, -2, -2), c(2, 2, 2))
cov <- matrix(c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3)
data <- rGMM(n = 1e3, d = 3, k = 2, means = mean_list, covs = cov)
# Four components with missingness.
# Bivariate normal observations.
mean_list <- list(c(2, 2), c(2, -2), c(-2, 2), c(-2, -2))
cov <- 0.5 * diag(2)
data <- rGMM(
n = 1000,
d = 2,
k = 4,
pi = c(0.35, 0.15, 0.15, 0.35),
miss = 0.1,
means = mean_list,
covs = cov)