R: Semi-supervised clustering with must-link constraints

constrEM {MBCbook}

R Documentation

Semi-supervised clustering with must-link constraints

Description

Semi-supervised clustering with must-link constraints allows to cluster data for which must-link constraints are available. This function implements the method described in Shental et al. (2003, ISBN:9781615679119).

Usage

constrEM(X, K, C, maxit = 30)

Arguments

`X`	a data frame of observations, assuming the rows are the observations and the columns the variables. Note that NAs are not allowed.
`K`	the number of desired groups.
`C`	a vector encoding the must-link constraints through chuncklets. This vector has to be of the length of the number of observations. Two observations that have to be in the same group must be in the same chuncklet. For instance, the chuncklet vector (1,2,3,4,3,5) indicate that 3rd and the 5th observations have a must-link constraint. If there is no must-link constraints, this vector should be simply 1:nrow(X).
`maxit`	the maximum number of iterations.

Value

A list is returned with the following fields:

`cls`	a vector containg the group memberships of the observations.
`T`	the posterior probabilities that the observations belong to the K groups.
`prop`	the estimated mixture proportions.
`mu`	the estimated mixture means.
`S`	the estimated mixture covariance matrices.
`ll`	the log-likelihood value at convergence.

Author(s)

C. Bouveyron

References

This function implements the method described in Shental, N., Bar-Hillel, A., Hertz, T., and Weinshall, D., Computing Gaussian mixture models with EM using equivalence constraints, Proceedings of the 16th International Conference on Neural Information Processing Systems, pages 465–472, 2003 (ISBN:9781615679119).

Examples

# Simulation of some data
set.seed(123)
n = 200
m1 = c(0,0); m2 = 4*c(1,1); m3 = 4*c(1,1)
S1 = diag(2); S2 = rbind(c(1,0),c(0,0.05))
S3 = rbind(c(0.05,0),c(0,1))
X = rbind(mvrnorm(n,m1,S1),mvrnorm(n,m2,S2),mvrnorm(n,m3,S3))
cls = rep(1:3,c(n,n,n))

# Encoding the constraints through chunklets
# Observations 397 and 408 are in the same chunklet
a = 398
b = 430
C = c(1:(b-1),a,b:(nrow(X)-1))

# Clustering with constrEM
res = constrEM(X,K=3,C,maxit=20)

[Package MBCbook version 0.1.2 Index]