mfa {EMMIXmfa}R Documentation

Mixtures of Factor Analyzers

Description

Functions for fitting mixtures of factor analyzers (MFA) and mixtures of t-factor analyzers (MtFA) to data. Maximum Likelihood estimates of the model parameters are obtained using the Alternating Expectation Conditional Maximization (AECM) algorithm.

In the case of MFA, component distributions belong to the family of multivariate normal distributions, while with MtFA the component distributions correspond to multivariate t distributions.

Usage

mfa(Y, g, q, itmax = 500, nkmeans = 20, nrandom = 20,
  tol = 1.e-5, sigma_type = 'common', D_type = 'common', init_clust = NULL,
  init_para = NULL, conv_measure = 'diff', warn_messages = TRUE, ...)
mtfa(Y, g, q, itmax = 500, nkmeans = 20, nrandom = 20,
  tol = 1.e-5, df_init = rep(30, g), df_update = TRUE,
  sigma_type = 'common', D_type = 'common', init_clust = NULL,
  init_para = NULL, conv_measure = 'diff', warn_messages = TRUE, ...)

Arguments

Y

A matrix or a data frame of which rows correspond to observations and columns to variables.

g

Number of components.

q

Number of factors.

itmax

Maximum number of EM iterations.

nkmeans

The number of times the k-means algorithm to be used in partition the data into g groups. These groupings are then used in initializing the parameters for the EM algorithm.

nrandom

The number of random g-group partitions for the data to be used initializing the EM algorithm.

tol

The EM algorithm terminates if the measure of convergence falls below this value.

sigma_type

To specify whether the covariance matrices (for mfa) or the scale matrices (for mtfa) of the components are constrained to be the same (default, sigma_type = "common") or not (sigma_type = "unique").

D_type

To specify whether the diagonal error covariance matrix is common to all the components or not. If sigma_type = "unique", then D_type could either be "common" (the default) to each component, or "unique". If the sigma_type = "common", then D_type must also be "common".

init_clust

A vector or matrix consisting of partition of samples to be used in the EM algorithm. For matrix of partitions, columns must corresponds individual partitions of the data. Optional.

init_para

A list containing model parameters to be used as initial parameter estimates for the EM algorithm. Optional.

conv_measure

The default 'diff' stops the EM iterations if |l^{(k+1)} - l^{(k)}| < tol where l^{(k)} is the log-likelihood at the kth EM iteration. If 'ratio', then the convergence of the EM steps is measured using the |(l^{(k+1)} - l^{(k)})/l^{(k+1)}|.

df_init

Initial values of the degree of freedom parameters for mtfa.

df_update

If df_update = TRUE (default), then the degree of freedom parameters values will be updated during the EM iterations. Otherwise, if df_update = FALSE, they will be fixed at the initial values specified in df_init.

warn_messages

With warn_messages = TRUE (default), the output would include some description of the reasons where, if any, the model fitting function failed to provide a fit for a given set of initial parameter values.

...

Not used.

Details

Cluster a given data set using mixtures of factor analyzers or approach or using mixtures of t-factor analyzers.

Value

Object of class c("emmix", "mfa") or c("emmix", "mtfa") containing the fitted model parameters is returned. Details of the components are as fellows:

g

Number of mixture components.

q

Number of factors.

pivec

Mixing proportions of the components.

mu

Matrix containing estimates of component means (in columns) of mixture component. Size p \times g.

B

Array containing component dependent loading matrices. Size p \times q \times g.

D

Estimates of error covariance matrices. If D_type = "common" was used then D is p \times p matrix common to all components, if D_type = "unique", then D is a size p \times p \times g array.

v

Degrees of freedom for each component.

logL

Log-likelihood at the convergence.

BIC

Bayesian information criterion.

tau

Matrix of posterior probabilities for the data used based on the fitted values. Matrix of size n by g.

clust

Vector of integers 1 to g indicating cluster allocations of the observations.

Uscores

Estimated conditional expected component scores of the unobservable factors given the data and the component membership. Size is Size n \times q \times g.

Umean

Means of the estimated conditional expected factors scores over estimated posterior distributions. Size n \times q.

Uclust

Alternative estimate of Umean where the posterior probabilities for each sample are replaced by component indicator vectors which contain one in the element corresponding to the highest posterior probability while others zero. Size n \times q.

ERRMSG

Description of messages, if any.

D_type

Whether common or unique error covariance is used, as specified in model fitting.

df_update

Whether the degree of freedom parameter (v) was fixed or estimated (only for mtfa).

Author(s)

Suren Rathnayake, Geoffrey McLachlan

References

Ghahramani Z, and Hinton GE (1997). The EM algorithm for mixture of factor analyzers. Technical Report, CRG-TR-96-1, University of Toronto, Toronto.

McLachlan GJ, Bean RW, Ben-Tovim Jones L (2007). Extension of the mixture of factor analyzers model to incorporate the multivariate t distribution. Computational Statistics & Data Analysis, 51, 5327–5338.

McLachlan GJ, Baek J, and Rathnayake SI (2011). Mixtures of factor analyzers for the analysis of high-dimensional data. In Mixture Estimation and Applications, KL Mengersen, CP Robert, and DM Titterington (Eds). Hoboken, New Jersey: Wiley, pp. 171–191.

McLachlan GJ, Peel D, and Bean RW (2003). Modelling high-dimensional data by mixtures of factor analyzers. Computational Statistics & Data Analysis 41, 379–388.

See Also

mcfa

Examples

model <- mfa(iris[, -5], g=3, q=2, itmax=200, nkmeans=1, nrandom=5)
summary(model)

model <- mtfa(iris[, -5], g=3, q=2, itmax=200, nkmeans=1, nrandom=5)
  

[Package EMMIXmfa version 2.0.14 Index]