simData {fabMix}R Documentation

Synthetic data generator

Description

Simulate data from a multivariate normal mixture using a mixture of factor analyzers mechanism.

Usage

simData(sameSigma, sameLambda, p, q, K.true, n, loading_means, loading_sd, sINV_values)

Arguments

sameSigma

Logical.

sameLambda

Logical.

p

The dimension of the multivariate normal distribution (p > 1).

q

Number of factors. It should be strictly smaller than p.

K.true

The number of mixture components (clusters).

n

Sample size.

loading_means

A vector which contains the means of blocks of factor loadings.

Default: loading_means = c(-30,-20,-10,10, 20, 30).

loading_sd

A vector which contains the standard deviations of blocks of factor loadings.

Default: loading_sd <- rep(2, length(loading_means)).

sINV_values

A vector which contains the values of the diagonal of the (common) inverse covariance matrix, if sigmaTrue = TRUE. An K\times p matrix which contains the values of the diagonal of the inverse covariance matrix per component, if sigmaTrue = FALSE.

Default: sINV_values = rgamma(p, shape = 1, rate = 1).

Value

A list with the following entries:

data

n\times p array containing the simulated data.

class

n-dimensional vector containing the class of each observation.

factorLoadings

K.true\times p \times q-array containing the factor loadings \Lambda_{krj} per cluster k, feature r and factor j, where k=1,\ldots,K; r=1,\ldots,p; j=1,\ldots,q.

means

K.true\times p matrix containing the marginal means \mu_{kr}, k=1,\ldots,K; r=1,\ldots,p.

variance

p\times p diagonal matrix containing the variance of errors \sigma_{rr}, r=1,\ldots,p. Note that the same variance of errors is assumed for each cluster.

factors

n\times q matrix containing the simulated factor values.

weights

K.true-dimensional vector containing the weight of each cluster.

Note

The marginal variance for cluster k is equal to \Lambda_k\Lambda_k^{T} + \Sigma.

Author(s)

Panagiotis Papastamoulis

Examples

library('fabMix')

n = 8                # sample size
p = 5                # number of variables
q = 2                # number of factors
K = 2                # true number of clusters

sINV_diag = 1/((1:p))    # diagonal of inverse variance of errors
set.seed(100)
syntheticDataset <- simData(sameLambda=TRUE,K.true = K, n = n, q = q, p = p, 
                        sINV_values = sINV_diag)
summary(syntheticDataset)

[Package fabMix version 5.1 Index]