mcfa {EMMIXmfa} | R Documentation |
Mixture of Common Factor Analyzers
Description
Functions for fitting mixtures of common factor analyzers (MCFA) models. MCFA models are mixture of factor analyzers (belong to the class of multivariate finite mixture models) with a common component matrix for the factor loadings before the transformation of the latent factors to be white noise. It is designed specifically for the task of displaying the observed data points in a lower (q-dimensional) space, where q is the number of factors adopted in the factor-analytic representation of the observed vector.
The mcfa
function fits mixtures common factor analyzers
where the components distributions belong to the family of
multivariate normal distributions.
The mctfa
function fits
mixtures of common t-factor analyzers where
the component distributions corresponds to multivariate
t distributions.
Maximum likelihood estimates of the model parameters are obtained
using the Expectation–Maximization algorithm.
Usage
mcfa(Y, g, q, itmax = 500, nkmeans = 5, nrandom = 20,
tol = 1.e-5, init_clust = NULL, init_para = NULL,
init_method = NULL, conv_measure = 'diff',
warn_messages = TRUE, ...)
mctfa(Y, g, q, itmax = 500, nkmeans = 5, nrandom = 20,
tol = 1.e-5, df_init = rep(30, g), df_update = TRUE,
init_clust = NULL, init_para = NULL, init_method = NULL,
conv_measure = 'diff', warn_messages = TRUE, ...)
Arguments
Y |
A matrix or a data frame of which rows correspond to observations and columns to variables. |
g |
Number of components. |
q |
Number of factors. |
itmax |
Maximum number of EM iterations. |
nkmeans |
The number of times the k-means algorithm to be used in partition
the data into |
nrandom |
The number of random |
tol |
The EM algorithm terminates if the measure of convergence falls below this value. |
init_clust |
A vector or matrix consisting of partition of samples to be used in the EM algorithm. For matrix of partitions, columns must corresponds individual partitions of the data. Optional. |
init_para |
A list containing model parameters to be used as initial parameter estimates for the EM algorithm. Optional. |
init_method |
To determine how the initial parameter values are computed. See Details. |
conv_measure |
The default |
df_init |
Initial values of the degree of freedom parameters for |
df_update |
If |
warn_messages |
With |
... |
Not used. |
Details
With init_method = NULL
, the default,
model parameters are initialized using all available methods.
With the init_method = "rand-A"
, the initialization of
the parameters is done using the procedure in
Baek et al. (2010) where initial values for elements of
A
are drawn from the N(0, 1)
distribution.
This method is appropriate when the columns of the data
are on the same scale. The
init_method = "eigen-A"
takes the first q
eigenvectors of Y
as the
initial value for the loading matrix A
.
If init_method = "gmf"
then the data are factorized using
gmf
with q
factors and the resulting loading
matrix is used as the initial value for A
.
If specified, the optional argument init_para
must be a list or an object of class mcfa
or mctfa
.
When fitting an mcfa
model, only the
model parameters q
, g
,
pivec
, A
, xi
,
omega
, and D
are extracted from
init_para
, while one extra parameter
nu
is extracted when fitting mctfa
.
Everything else in init_para
will be discarded.
Value
Object of class c("emmix", "mcfa")
or c("emmix",
"mctfa")
containing the fitted model parameters is returned.
Details of the components are as follows:
g |
Number of mixture components. |
q |
Number of factors. |
pivec |
Mixing proportions of the components. |
A |
Loading matrix. Size |
xi |
Matrix containing factor means for components in columns.
Size |
omega |
Array containing factor covariance matrices for components.
Size |
D |
Error covariance matrix. Size |
Uscores |
Estimated conditional expected component scores of the
unobservable factors given the data and the component membership.
Size |
Umean |
Means of the estimated conditional expected factors scores over
estimated posterior distributions. Size |
Uclust |
Alternative estimate of |
clust |
Cluster labels. |
tau |
Posterior probabilities. |
logL |
Log-likelihood at the convergence. |
BIC |
Bayesian information criterion. |
warn_msg |
Description of error messages, if any. |
Author(s)
Suren Rathnayake, Jangsun Baek, Geoff McLachlan
References
Baek J, McLachlan GJ, and Flack LK (2010). Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 2089–2097.
Baek J, and McLachlan GJ (2011). Mixtures of common t-factor analyzers for clustering highdimensional microarray data. Bioinformatics 27, 1269–1276.
McLachlan GJ, Baek J, and Rathnayake SI (2011). Mixtures of factor analyzers for the analysis of high-dimensional data. In Mixture Estimation and Applications, KL Mengersen, CP Robert, and DM Titterington (Eds). Hoboken, New Jersey: Wiley, pp. 171–191.
See Also
Examples
mcfa_fit <- mcfa(iris[, -5], g = 3, q = 3, itmax = 25,
nkmeans = 5, nrandom = 5, tol = 1.e-5)
plot(mcfa_fit)
mctfa_fit <- mcfa(iris[, -5], g = 3, q = 3, itmax = 500,
nkmeans = 5, nrandom = 5, tol = 1.e-5, df_update = TRUE)