R: Fit a mixture of probabilistic principal components analysis...

mppca.metabol {MetabolAnalyze}

R Documentation

Fit a mixture of probabilistic principal components analysis (MPPCA) model to a metabolomic data set via the EM algorithm to perform simultaneous dimension reduction and clustering.

Description

This function fits a mixture of probabilistic principal components analysis model to metabolomic spectral data via the EM algorithm.

Usage

mppca.metabol(Y, minq=1, maxq=2, ming, maxg, scale = "none", 
epsilon = 0.1, plot.BIC = FALSE)

Arguments

`Y`	An N x p data matrix where each row is a spectrum.
`minq`	The minimum number of principal components to be fit. By default minq is 1.
`maxq`	The maximum number of principal components to be fit. By default maxq is 2.
`ming`	The minimum number of groups to be fit.
`maxg`	The maximum number of groups to be fit.
`scale`	Type of scaling of the data which is required. The default is "none". Options include "pareto' and "unit" scaling. See `scaling` for further details.
`epsilon`	Value on which the convergence assessment criterion is based. Set by default to 0.1.
`plot.BIC`	Logical indicating whether or not a plot of the BIC values for the different models fitted should be provided. By default, the plot is not produced.

Details

This function fits a mixture of probabilistic principal components analysis models to metabolomic spectral data via the EM algorithm. A range of models with different numbers of groups and different numbers of principal components can be fitted. The model performs simultaneous clustering of observations into unknown groups and dimension reduction simultaneously.

Value

A list containing:

`q`	The number of principal components in the optimal MPPCA model, selected by the BIC.
`g`	The number of groups in the optimal MPPCA model, selected by the BIC.
`sig`	The posterior mode estimate of the variance of the error terms.
`scores`	A list of length g, each entry of which is a n_g x q matrix of estimates of the latent locations of each observation in group g in the principal subspace.
`loadings`	An array of dimension p x q x g, each sheet of which contains the maximum likelihood estimate of the p x q loadings matrix for a group.
`Pi`	The vector indicating the probability of belonging to each group.
`mean`	A p x g matrix, each column of which contains a group mean.
`tau`	An N x g matrix, each row of which contains the posterior group membership probabilities for an observation.
`clustering`	A vector of length N indicating the group to which each observation belongs.
`BIC`	A matrix containing the BIC values for the fitted models.
`AIC`	A matrix containing the AIC values for the fitted models.

Author(s)

Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan.

References

Nyamundanda G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report, University College Dublin.

Examples

data(BrainSpectra)
## Not run: 
mdlfit<-mppca.metabol(BrainSpectra[[1]], minq=7, maxq=7, ming=4, maxg=4, 
plot.BIC = TRUE)
mppca.scores.plot(mdlfit)
mppca.loadings.plot(mdlfit, BrainSpectra[[1]])

## End(Not run)

[Package MetabolAnalyze version 1.3.1 Index]