| predict.orclass {orclus} | R Documentation |
Subspace clustering based local classification using ORCLUS.
Description
Assigns clusters and distances and classes for new data according to the intrinsic subspace clusters of an orclass classification model.
Usage
## S3 method for class 'orclass'
predict(object, newdata, type = "nearest", ...)
Arguments
object |
Model resulting from a call of |
newdata |
A matrix or data frame to be clustered by the given model. |
type |
Default |
... |
Currently not used. |
Details
For prediction the class distribution of the "nearest"" cluster is used.
If type = "fuzzywts" cluster memberships (see e.g. Bezdek, 1981) are computed based on the cluster distances of cluster assignment by predict.orclus. For orclass prediction the class distributions of the clusters are weigthed using the cluster memberships of an observation.
Value
class |
Vector of predicted class levels. |
posterior |
Matrix where coloumns contain class posterior probabilities. |
distances |
A matrix where coloumns are the distances to all cluster centers in the corresponding subspaces for the new data. |
cluster |
The resulting cluster labels for the new data. |
Author(s)
Gero Szepannek
References
Aggarwal, C. and Yu, P. (2000): Finding generalized projected clusters in high dimensional spaces, Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 70-81.
Bezdek, J. (1981): Pattern recognition with fuzzy objective function algorithms, Kluwer Academic, Norwell, MA.
See Also
orclass, orclus, predict.orclus
Examples
# definition of a function for parameterized data simulation
sim.orclus <- function(k = 3, nk = 100, d = 10, l = 4,
sd.cl = 0.05, sd.rest = 1, locshift = 1){
### input parameters for data generation
# k number of clusters
# nk observations per cluster
# d original dimension of the data
# l subspace dimension where the clusters are concentrated
# sd.cl (within cluster subspace) standard deviations for data generation
# sd.rest standard deviations in the remaining space
# locshift parameter of a uniform distribution to sample different cluster means
x <- NULL
for(i in 1:k){
# cluster centers
apts <- locshift*matrix(runif(l*k), ncol = l)
# sample points in original space
xi.original <- cbind(matrix(rnorm(nk * l, sd = sd.cl), ncol=l) + matrix(rep(apts[i,], nk),
ncol = l, byrow = TRUE),
matrix(rnorm(nk * (d-l), sd = sd.rest), ncol = (d-l)))
# subspace generation
sym.mat <- matrix(nrow=d, ncol=d)
for(m in 1:d){
for(n in 1:m){
sym.mat[m,n] <- sym.mat[n,m] <- runif(1)
}
}
subspace <- eigen(sym.mat)$vectors
# transformation
xi.transformed <- xi.original %*% subspace
x <- rbind(x, xi.transformed)
}
clids <- rep(1:k, each = nk)
result <- list(x = x, cluster = clids)
return(result)
}
# simulate data of 2 classes where class 1 consists of 2 subclasses
simdata <- sim.orclus(k = 3, nk = 200, d = 15, l = 4,
sd.cl = 0.05, sd.rest = 1, locshift = 1)
x <- simdata$x
y <- c(rep(1,400), rep(2,200))
res <- orclass(x, y, k = 3, l = 4, k0 = 15, a = 0.75)
prediction <- predict(res, x)
# compare results
table(prediction$class, y)