R: Fitting Gaussian Mixture Model

mModel {bgmm}

R Documentation

Fitting Gaussian Mixture Model

Description

These functions fit different variants of Gaussian mixture models. These variants differ in the fraction of knowledge utilized into the the fitting procedure.

Usage

belief(X, knowns, B = NULL, k = ifelse(!is.null(B), ncol(B), 
    ifelse(!is.null(P), ncol(P), length(unique(class)))), P = NULL, 
    class = map(B), init.params = init.model.params(X, knowns, 
        B = B, P = P, class = class, k = k), model.structure = getModelStructure(), 
    stop.likelihood.change = 10^-5, stop.max.nsteps = 100, trace = FALSE, 
    b.min = 0.025, 
    all.possible.permutations=FALSE, pca.dim.reduction = NA)
    
soft(X, knowns, P = NULL, k = ifelse(!is.null(P), ncol(P), 
    ifelse(!is.null(B), ncol(B), length(unique(class)))), B = NULL, 
    class = NULL, init.params = init.model.params(X, knowns, 
        class = class, B = P, k = k), 
    model.structure = getModelStructure(), stop.likelihood.change = 10^-5, 
    stop.max.nsteps = 100, trace = FALSE, b.min = 0.025, 
	all.possible.permutations=FALSE, pca.dim.reduction = NA, ...)    
    
semisupervised(X, knowns, class = NULL, k = ifelse(!is.null(class), 
    length(unique(class)), ifelse(!is.null(B), ncol(B), ncol(P))), 
    B = NULL, P = NULL, ..., init.params = NULL,
	all.possible.permutations=FALSE, pca.dim.reduction = NA)    
    
supervised(knowns, class = NULL, k = length(unique(class)), B = NULL, P = NULL, 
    model.structure = getModelStructure(), ...)

unsupervised(X, k, init.params=init.model.params(X, knowns=NULL, k=k), 
      model.structure=getModelStructure(), stop.likelihood.change=10^-5, 
      stop.max.nsteps=100, trace=FALSE, ...)

Arguments

`X`	a data.frame with the unlabeled observations. The rows correspond to the observations while the columns to variables/dimensions of the data.
`knowns`	a data.frame with the labeled observations. The rows correspond to the observations while the columns to variables/dimensions of the data.
`B`	a beliefs matrix which specifies the distribution of beliefs for the labeled observations. The number of rows in B should equal the number of rows in the data.frame `knowns`. It is assumed that both the observations in `B` and in `knowns` are given in the same order. Columns correspond to the model components. If matrix B is provided, the number of columns has to be less or equal `k`. Internally, the matrix `B` is completed to `k` columns.
`P`	a matrix of plausibilities, i.e., weights of the prior probabilities for the labeled observations. If matrix `P` is provided, the number of columns has to be less or equal `k`. The came conditions as for `B` apply.
`class`	a vector of classes/labels for the labeled observations. The number of its unique values has to be less or equal `k`.
`k`	a number of components, by default equal to the number of columns of `B`.
`init.params`	initial values for the estimates of the model parameters (means, variances and mixing proportions), by default derived with the use of the `init.model.params` function.
`stop.likelihood.change`, `stop.max.nsteps`	the parameters for the EM algorithms defining the stop criteria, i.e., the minimum required improvement of loglikelihood and the maximum number of steps.
`trace`	if `trace=TRUE` the loglikelihoods for every step of EM algorithm are printed out.
`model.structure`	an object returned by the `getModelStructure` function, which specifies constraints for the parameters of the model to be fitted.
`b.min`	this argument is passed to the `init.model.params` function.
`...`	these arguments will be passed tothe `init.model.params` function.
`all.possible.permutations`	If equal `TRUE`, all possible initial parameters' permutations of components are considered. Since there is kList! permutations, model fitting is repeated kList! times. As a result, only the model with the highest likelihood is returned.
`pca.dim.reduction`	Since the fitting for high dimensional space is numerically a bad idea an attempt to PCA will be performed if `pca.dim.reduction !- FALSE`. If equal `NA` then the target dimension is data driven, if it's a number then this will be the target dimension.

Details

In the belief() function, if the argument B is not provided, it is by default initialized from the argument P. If the argument P is not provided, B is derived from the class argument with the use of the function get.simple.beliefs() which assigns 1-(k-1)*b.min to the component given by class and b.min to all remaining components.

In the soft() function, if the argument P is not provided, it is by default initialized from the argument B. If the argument B is not provided, P is derived from the class argument as in the belief() function.

In the supervised() function, if the argument class is not provided, it is by default initialized from argument B or P, taking the label of each observation as its most believed or plausible component (by the MAP rule).

The number of columns in the beliefs matrix B or in the matrix of plausibilities P may be smaller than the number of model components defined by the argument k. Such situation corresponds to the scenario when the user does not know any examples for some component. In other words, this component is not used as a label for any observation, and thus can be omitted from the beliefs matrix. An equivalent would be to include a column for this component and fill it with beliefs/plausibilities equal 0.

Slots in the returned object are listed in section Value. The returned object differs slighty with respect to the used function. Namely, the belief() function returns an object with the slot B. The function soft() returns an object with a slot P, while the functions supervised() and semisupervised() return objects with a slot class instead.

The object returned by the function supervised() does not have the slot X.

Value

An object of the class mModel, with the following slots:

`pi`	a vector with the fitted mixing proportions
`mu`	a matrix with the means' vectors, fitted for all components
`cvar`	a three-dimensional matrix with the covariance matrices, fitted for all components
`X`	the unlabeled observations
`knowns`	the labeled observations
`B`	the beliefs matrix
`n`	the number of all observations
`m`	the number of the unlabeled observations
`k`	the number of fitted model components
`d`	the data dimension
`likelihood`	the log-likelihood of the fitted model
`n.steps`	the number of steps performed by the EM algorithm
`model.structure`	the set of constraints kept during the fitting process.

Author(s)

Przemyslaw Biecek

References

Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn (2012), The R Package bgmm: Mixture Modeling with Uncertain Knowledge, Journal of Statistical Software.

Examples

data(genotypes)

modelSupervised = supervised(knowns=genotypes$knowns, 
            class=genotypes$labels)
plot(modelSupervised)

modelSemiSupervised = semisupervised(X=genotypes$X, 
            knowns=genotypes$knowns, class = genotypes$labels)
plot(modelSemiSupervised)

modelBelief = belief(X=genotypes$X, 
            knowns=genotypes$knowns, B=genotypes$B)
plot(modelBelief)

modelSoft = soft(X=genotypes$X, 
            knowns=genotypes$knowns, P=genotypes$B)
plot(modelSoft)

modelUnSupervised = unsupervised(X=genotypes$X, k=3)
plot(modelUnSupervised)

[Package bgmm version 1.8.5 Index]