riem.nmshift {Riemann} | R Documentation |
Nonlinear Mean Shift
Description
Given N
observations X_1, X_2, \ldots, X_N \in \mathcal{M}
,
perform clustering of the data based on the nonlinear mean shift algorithm.
Gaussian kernel is used with the bandwidth h
as of
G(x_i, x_j) \propto \exp \left( - \frac{\rho^2 (x_i,x_j)}{h^2} \right)
where \rho(x,y)
is geodesic distance between two points x,y\in\mathcal{M}
.
Numerically, some of the limiting points that collapse into the same cluster are
not exact. For such purpose, we require maxk
parameter to search the
optimal number of clusters based on k
-medoids clustering algorithm
in conjunction with silhouette criterion.
Usage
riem.nmshift(riemobj, h = 1, maxk = 5, maxiter = 50, eps = 1e-05)
Arguments
riemobj |
a S3 |
h |
bandwidth parameter. The larger the |
maxk |
maximum number of clusters to determine the optimal number of clusters. |
maxiter |
maximum number of iterations to be run. |
eps |
tolerance level for stopping criterion. |
Value
a named list containing
- distance
an
(N\times N)
distance between modes corresponding to each data point.- cluster
a length-
N
vector of class labels.
References
Subbarao R, Meer P (2009). “Nonlinear Mean Shift over Riemannian Manifolds.” International Journal of Computer Vision, 84(1), 1–20. ISSN 0920-5691, 1573-1405.
Examples
#-------------------------------------------------------------------
# Example on Sphere : a dataset with three types
#
# class 1 : 10 perturbed data points near (1,0,0) on S^2 in R^3
# class 2 : 10 perturbed data points near (0,1,0) on S^2 in R^3
# class 3 : 10 perturbed data points near (0,0,1) on S^2 in R^3
#-------------------------------------------------------------------
## GENERATE DATA
set.seed(496)
ndata = 10
mydata = list()
for (i in 1:ndata){
tgt = c(1, stats::rnorm(2, sd=0.1))
mydata[[i]] = tgt/sqrt(sum(tgt^2))
}
for (i in (ndata+1):(2*ndata)){
tgt = c(rnorm(1,sd=0.1),1,rnorm(1,sd=0.1))
mydata[[i]] = tgt/sqrt(sum(tgt^2))
}
for (i in ((2*ndata)+1):(3*ndata)){
tgt = c(stats::rnorm(2, sd=0.1), 1)
mydata[[i]] = tgt/sqrt(sum(tgt^2))
}
myriem = wrap.sphere(mydata)
mylabs = rep(c(1,2,3), each=ndata)
## RUN NONLINEAR MEANSHIFT FOR DIFFERENT 'h' VALUES
run1 = riem.nmshift(myriem, maxk=10, h=0.1)
run2 = riem.nmshift(myriem, maxk=10, h=1)
run3 = riem.nmshift(myriem, maxk=10, h=10)
## MDS FOR VISUALIZATION
mds2d = riem.mds(myriem, ndim=2)$embed
## VISUALIZE
opar <- par(no.readonly=TRUE)
par(mfrow=c(2,3), pty="s")
plot(mds2d, pch=19, main="label : h=0.1", col=run1$cluster)
plot(mds2d, pch=19, main="label : h=1", col=run2$cluster)
plot(mds2d, pch=19, main="label : h=10", col=run3$cluster)
image(run1$distance[,30:1], axes=FALSE, main="distance : h=0.1")
image(run2$distance[,30:1], axes=FALSE, main="distance : h=1")
image(run3$distance[,30:1], axes=FALSE, main="distance : h=10")
par(opar)