R: Fit a probabilistic principal components analysis model to a...

ppca.metabol.jack {MetabolAnalyze}

R Documentation

Fit a probabilistic principal components analysis model to a metabolomic data set, and assess uncertainty via the jackknife.

Description

Fit a probabilistic principal components analysis (PPCA) model to a metabolomic data set via the EM algorithm, and assess uncertainty in the obtained loadings estimates via the jackknife.

Usage

ppca.metabol.jack(Y, minq=1, maxq=2, scale ="none", 
epsilon = 0.1, conflevel = 0.95)

Arguments

`Y`	An N x p data matrix where each row is a spectrum.
`minq`	The minimum number of principal components to be fit. By default minq is 1.
`maxq`	The maximum number of principal components to be fit. By default maxq is 2.
`scale`	Type of scaling of the data which is required. The default is "none". Options include "pareto' and "unit" scaling. See `scaling` for further details.
`epsilon`	Value on which the convergence assessment criterion is based. Set by default to 0.1.
`conflevel`	Level of confidence required for the loadings confidence intervals. By default 95`\%` confidence intervals are computed.

Details

A (range of) PPCA model(s) are fitted and an optimal model (i.e. number of principal components, q) is selected. Confidence intervals for the obtained loadings are then obtained via the jackknife i.e. a model with q principal components is fitted to the dataset N times, where an observation is removed from the dataset each time.

On convergence of the algorithm, the number of loadings significantly different from zero is printed on screen. The user may then further examine the significant loadings when prompted by selecting a cutoff value from the table printed on screen. Bar plots detailing the resulting significantly high loadings are provided.

Value

A list containing:

`q`	The number of principal components in the optimal PPCA model, selected by the BIC.
`sig`	The posterior mode estimate of the variance of the error terms.
`scores`	An N x q matrix of estimates of the latent locations of each observation in the principal subspace.
`loadings`	The maximum likelihood estimate of the p x q loadings matrix.
`SignifW`	The maximum likelihood estimate of the loadings matrix for those loadings significantly different from zero.
`SignifHighW`	The maximum likelihood estimate of the loadings matrix for those loadings significantly different from zero and higher than a user selected cutoff point.
`Lower`	The lower limit of the confidence interval for those loadings significantly different from zero.
`Upper`	The upper limit of the confidence interval for those loadings significantly different from zero.
`Cutoffs`	A table detailing a range of cutoff points and the associated number of selected spectral bins.
`number`	The number of spectral bins selected by the user.
`cutoff`	The cutoff value selected by the user.
`BIC`	A vector containing the BIC values for the fitted models.
`AIC`	A vector containing the AIC values for the fitted models.

Author(s)

Nyamundanda Gift, Isobel Claire Gormley and Lorraine Brennan.

References

Nyamundanda G., Gormley, I.C. and Brennan, L. (2010) Probabilistic principal components analysis for metabolomic data. Technical report, University College Dublin.

Examples

data(UrineSpectra)
## Not run: 
mdlfit<-ppca.metabol.jack(UrineSpectra[[1]], minq=2, maxq=2, scale="none")
loadings.jack.plot(mdlfit)
ppca.scores.plot(mdlfit, group=UrineSpectra[[2]][,1])
## End(Not run)

[Package MetabolAnalyze version 1.3.1 Index]