R: Subspace clustering based local classification using ORCLUS.

predict.orclass {orclus}

R Documentation

Subspace clustering based local classification using ORCLUS.

Description

Assigns clusters and distances and classes for new data according to the intrinsic subspace clusters of an orclass classification model.

Usage

## S3 method for class 'orclass'
predict(object, newdata, type = "nearest", ...)

Arguments

`object`	Model resulting from a call of `orclass`.
`newdata`	A matrix or data frame to be clustered by the given model.
`type`	Default `"nearest"` computes relative class frequencies of nearest cluster as class posterior probabilities.
`...`	Currently not used.

Details

For prediction the class distribution of the "nearest"" cluster is used. If type = "fuzzywts" cluster memberships (see e.g. Bezdek, 1981) are computed based on the cluster distances of cluster assignment by predict.orclus. For orclass prediction the class distributions of the clusters are weigthed using the cluster memberships of an observation.

Value

`class`	Vector of predicted class levels.
`posterior`	Matrix where coloumns contain class posterior probabilities.
`distances`	A matrix where coloumns are the distances to all cluster centers in the corresponding subspaces for the new data.
`cluster`	The resulting cluster labels for the new data.

Author(s)

Gero Szepannek

References

Aggarwal, C. and Yu, P. (2000): Finding generalized projected clusters in high dimensional spaces, Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 70-81.

Bezdek, J. (1981): Pattern recognition with fuzzy objective function algorithms, Kluwer Academic, Norwell, MA.

Examples

# definition of a function for parameterized data simulation
sim.orclus <- function(k = 3, nk = 100, d = 10, l = 4, 
                       sd.cl = 0.05, sd.rest = 1, locshift = 1){
  ### input parameters for data generation
  # k           number of clusters
  # nk          observations per cluster
  # d           original dimension of the data
  # l           subspace dimension where the clusters are concentrated
  # sd.cl       (within cluster subspace) standard deviations for data generation 
  # sd.rest     standard deviations in the remaining space 
  # locshift    parameter of a uniform distribution to sample different cluster means  

  x <- NULL
  for(i in 1:k){
  # cluster centers
  apts <- locshift*matrix(runif(l*k), ncol = l)  
  # sample points in original space
  xi.original <- cbind(matrix(rnorm(nk * l, sd = sd.cl), ncol=l) + matrix(rep(apts[i,], nk), 
                              ncol = l, byrow = TRUE),
                       matrix(rnorm(nk * (d-l), sd = sd.rest), ncol = (d-l)))  
  # subspace generation
  sym.mat <- matrix(nrow=d, ncol=d)
  for(m in 1:d){
    for(n in 1:m){
      sym.mat[m,n] <- sym.mat[n,m] <- runif(1)  
      }
    } 
  subspace <- eigen(sym.mat)$vectors    
  # transformation
  xi.transformed <- xi.original %*% subspace
  x <- rbind(x, xi.transformed)
  }  
  clids <- rep(1:k, each = nk)
  result <- list(x = x, cluster = clids)
  return(result)
  }

# simulate data of 2 classes where class 1 consists of 2 subclasses
simdata <- sim.orclus(k = 3, nk = 200, d = 15, l = 4, 
                      sd.cl = 0.05, sd.rest = 1, locshift = 1)

x <- simdata$x
y <- c(rep(1,400), rep(2,200))

res <- orclass(x, y, k = 3, l = 4, k0 = 15, a = 0.75)
prediction <- predict(res, x)

# compare results
table(prediction$class, y)

[Package orclus version 0.2-6 Index]