mcfa {EMMIXmfa}R Documentation

Mixture of Common Factor Analyzers

Description

Functions for fitting mixtures of common factor analyzers (MCFA) models. MCFA models are mixture of factor analyzers (belong to the class of multivariate finite mixture models) with a common component matrix for the factor loadings before the transformation of the latent factors to be white noise. It is designed specifically for the task of displaying the observed data points in a lower (q-dimensional) space, where q is the number of factors adopted in the factor-analytic representation of the observed vector.

The mcfa function fits mixtures common factor analyzers where the components distributions belong to the family of multivariate normal distributions. The mctfa function fits mixtures of common t-factor analyzers where the component distributions corresponds to multivariate t distributions. Maximum likelihood estimates of the model parameters are obtained using the Expectation–Maximization algorithm.

Usage

mcfa(Y, g, q, itmax = 500, nkmeans = 5, nrandom = 20,
  tol = 1.e-5, init_clust = NULL, init_para = NULL,
  init_method = NULL, conv_measure = 'diff',
  warn_messages = TRUE, ...)
mctfa(Y, g, q, itmax = 500, nkmeans = 5, nrandom = 20,
  tol = 1.e-5, df_init = rep(30, g), df_update = TRUE,
  init_clust = NULL, init_para = NULL, init_method = NULL,
  conv_measure = 'diff', warn_messages = TRUE, ...)

Arguments

Y

A matrix or a data frame of which rows correspond to observations and columns to variables.

g

Number of components.

q

Number of factors.

itmax

Maximum number of EM iterations.

nkmeans

The number of times the k-means algorithm to be used in partition the data into g groups. These groupings are then used in initializing the parameters for the EM algorithm.

nrandom

The number of random g-group partitions for the data to be used initializing the EM algorithm.

tol

The EM algorithm terminates if the measure of convergence falls below this value.

init_clust

A vector or matrix consisting of partition of samples to be used in the EM algorithm. For matrix of partitions, columns must corresponds individual partitions of the data. Optional.

init_para

A list containing model parameters to be used as initial parameter estimates for the EM algorithm. Optional.

init_method

To determine how the initial parameter values are computed. See Details.

conv_measure

The default 'diff' stops the EM iterations if |l^{(k+1)} - l^{(k)}| < tol where l^{(k)} is the log-likelihood at the kth EM iteration. If 'ratio', then the convergence of the EM steps is measured using the |(l^{(k+1)} - l^{(k)})/l^{(k+1)}|.

df_init

Initial values of the degree of freedom parameters for mctfa.

df_update

If df_update = TRUE (default), then the degree of freedom parameters values will be updated during the EM iterations. Otherwise, if df_update = FALSE, they will be fixed at the initial values specified in df_init.

warn_messages

With warn_messages = TRUE (default), the output would include some description of the reasons where, if any, the model fitting function failed to provide a fit for a given set of initial parameter values.

...

Not used.

Details

With init_method = NULL, the default, model parameters are initialized using all available methods. With the init_method = "rand-A", the initialization of the parameters is done using the procedure in Baek et al. (2010) where initial values for elements of A are drawn from the N(0, 1) distribution. This method is appropriate when the columns of the data are on the same scale. The init_method = "eigen-A" takes the first q eigenvectors of Y as the initial value for the loading matrix A. If init_method = "gmf" then the data are factorized using gmf with q factors and the resulting loading matrix is used as the initial value for A.

If specified, the optional argument init_para must be a list or an object of class mcfa or mctfa. When fitting an mcfa model, only the model parameters q, g, pivec, A, xi, omega, and D are extracted from init_para, while one extra parameter nu is extracted when fitting mctfa. Everything else in init_para will be discarded.

Value

Object of class c("emmix", "mcfa") or c("emmix", "mctfa") containing the fitted model parameters is returned. Details of the components are as follows:

g

Number of mixture components.

q

Number of factors.

pivec

Mixing proportions of the components.

A

Loading matrix. Size p \times q.

xi

Matrix containing factor means for components in columns. Size q \times g.

omega

Array containing factor covariance matrices for components. Size q \times q \times g.

D

Error covariance matrix. Size p \times p.

Uscores

Estimated conditional expected component scores of the unobservable factors given the data and the component membership. Size n \times q \times g.

Umean

Means of the estimated conditional expected factors scores over estimated posterior distributions. Size n \times q.

Uclust

Alternative estimate of Umean where the posterior probabilities for each sample are replaced by component indicator vectors which contain one in the element corresponding to the highest posterior probability while others zero. Size n \times q.

clust

Cluster labels.

tau

Posterior probabilities.

logL

Log-likelihood at the convergence.

BIC

Bayesian information criterion.

warn_msg

Description of error messages, if any.

Author(s)

Suren Rathnayake, Jangsun Baek, Geoff McLachlan

References

Baek J, McLachlan GJ, and Flack LK (2010). Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 2089–2097.

Baek J, and McLachlan GJ (2011). Mixtures of common t-factor analyzers for clustering highdimensional microarray data. Bioinformatics 27, 1269–1276.

McLachlan GJ, Baek J, and Rathnayake SI (2011). Mixtures of factor analyzers for the analysis of high-dimensional data. In Mixture Estimation and Applications, KL Mengersen, CP Robert, and DM Titterington (Eds). Hoboken, New Jersey: Wiley, pp. 171–191.

See Also

mfa, plot_factors

Examples

mcfa_fit <- mcfa(iris[, -5], g = 3, q = 3, itmax = 25,
                  nkmeans = 5, nrandom = 5, tol = 1.e-5)

plot(mcfa_fit)

mctfa_fit <- mcfa(iris[, -5], g = 3, q = 3, itmax = 500,
                  nkmeans = 5, nrandom = 5, tol = 1.e-5, df_update = TRUE)


[Package EMMIXmfa version 2.0.14 Index]