R: Subspace clustering based local classification using ORCLUS.

orclass {orclus}

R Documentation

Subspace clustering based local classification using ORCLUS.

Description

Function to perform local classification where the subclasses are concentrated in different subspaces of the data.

Usage

orclass(x, ...)
## Default S3 method:
orclass(x, grouping, k, l, k0, a = 0.5, prior = NULL, inner.loops = 1, 
                          predict.train = "nearest", verbose = TRUE, ...)
## S3 method for class 'formula'
orclass(formula, data = NULL, ...)

Arguments

`x`	A matrix or data frame containing the explanatory variables. The method is restricted to numerical data.
`grouping`	A factor specifying the class for each observation.
`formula`	A formula of the form `grouping ~ x1 + x2 + ...` That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.
`data`	Data frame from which variables specified in formula are to be taken.
`k`	Prespecifies the final number of clusters.
`l`	Prespecifies the dimension of the final cluster-specific subspaces (equal for all clusters).
`k0`	Initial number of clusters (that are computed in the entire data space). Must be greater than `k`. The number of clusters is iteratively decreased by factor `a` until the final number of `k` clusters is reached.
`a`	Prespecified factor for the cluster number reduction in each iteration step of the algorithm.
`prior`	Argument for optional specification of class prior probabilities if different from the relative class frequencies.
`inner.loops`	Number of repetitive iterations (i.e. recomputation of clustering and cluster-specific subspaces) while the number of clusters and the subspace dimension are kept constant.
`predict.train`	Character pecifying whether prediction of training data should be pursued. If `"nearest"` the class distribution in `orclus` cluster assignment is used for classification.
`verbose`	Logical indicating whether the iteration process sould be displayed.
`...`	Currently not used.

Details

For each cluster the class distribution is computed.

Value

Returns an object of class orclass.

`orclus.res`	Object of class `orclus` containing the resulting clusters.
`cluster.posteriors`	Matrix of clusterwise class posterior probabilities where clusters are rows and classes are coloumns.
`cluster.priors`	Vector of relative cluster frequencies weighted by class priors.
`purity`	Statistics indicating the discriminability of the identified clusters.
`prior`	Vector of class prior probabilities.
`predict.train`	Prediction of training data if specified.
`orclass.call`	(Matched) function call.

Author(s)

Gero Szepannek

References

Aggarwal, C. and Yu, P. (2000): Finding generalized projected clusters in high dimensional spaces, Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 70-81.

Examples

# definition of a function for parameterized data simulation
sim.orclus <- function(k = 3, nk = 100, d = 10, l = 4, 
                       sd.cl = 0.05, sd.rest = 1, locshift = 1){
  ### input parameters for data generation
  # k           number of clusters
  # nk          observations per cluster
  # d           original dimension of the data
  # l           subspace dimension where the clusters are concentrated
  # sd.cl       (within cluster subspace) standard deviations for data generation 
  # sd.rest     standard deviations in the remaining space 
  # locshift    parameter of a uniform distribution to sample different cluster means  

  x <- NULL
  for(i in 1:k){
  # cluster centers
  apts <- locshift*matrix(runif(l*k), ncol = l)  
  # sample points in original space
  xi.original <- cbind(matrix(rnorm(nk * l, sd = sd.cl), ncol=l) + matrix(rep(apts[i,], nk), 
                              ncol = l, byrow = TRUE),
                       matrix(rnorm(nk * (d-l), sd = sd.rest), ncol = (d-l)))  
  # subspace generation
  sym.mat <- matrix(nrow=d, ncol=d)
  for(m in 1:d){
    for(n in 1:m){
      sym.mat[m,n] <- sym.mat[n,m] <- runif(1)  
      }
    } 
  subspace <- eigen(sym.mat)$vectors    
  # transformation
  xi.transformed <- xi.original %*% subspace
  x <- rbind(x, xi.transformed)
  }  
  clids <- rep(1:k, each = nk)
  result <- list(x = x, cluster = clids)
  return(result)
  }

# simulate data of 2 classes where class 1 consists of 2 subclasses
simdata <- sim.orclus(k = 3, nk = 200, d = 15, l = 4, 
                      sd.cl = 0.05, sd.rest = 1, locshift = 1)

x <- simdata$x
y <- c(rep(1,400), rep(2,200))

res <- orclass(x, y, k = 3, l = 4, k0 = 15, a = 0.75)
res

# compare results
table(res$predict.train$class, y)

[Package orclus version 0.2-6 Index]