coverage {LPCM} | R Documentation |
Coverage and self-coverage plots.
Description
These functions compute coverages and self-coverages, and produce corresponding plots, for any principal curve object. The former may be used as goodness-of-fit measures, and the latter for for bandwidth selection.
Usage
coverage.raw(X, vec, tau, weights=1, plot.type="p", print=FALSE,
label=NULL,...)
coverage(X, vec, taumin=0.02, taumax, gridsize=25, weights=1,
plot.type="o", print=FALSE,...)
lpc.coverage(object, taumin=0.02, taumax, gridsize=25, quick=TRUE,
plot.type="o", print=FALSE, ...)
lpc.self.coverage(X, taumin=0.02, taumax=0.5, gridsize=25, x0=1,
way = "two", scaled=1, weights=1, pen=2, depth=1,
control=lpc.control(boundary=0, cross=FALSE), quick=TRUE,
plot.type="o", print=FALSE, ... )
ms.self.coverage(X, taumin=0.02, taumax=0.5, gridsize=25,
thr=0.001, scaled=1, cluster=FALSE, plot.type="o",
print=FALSE, ...)
select.self.coverage(self, smin, plot.type="o", plot.segments=NULL)
Arguments
X |
a |
object |
An object of type |
vec |
A matrix with |
tau |
tube size. |
taumin |
Minimal tube size. |
taumax |
Maximal tube size. |
weights |
An optional vector of weights. If weights are specified,
then the coverage is the weighted mean of the indicator functions
for falling within the tube. The function |
label |
Experimental option; don't use. |
gridsize |
The number of different tube sizes to consider. |
quick |
If TRUE, an approximate coverage curve is provided by computing distances between data points and the curve through the closest local centers or mass; whereas with FALSE we use the distances of the points when projected orthogonally onto the spline representation of the local principal curve. The latter takes considerably more computing time. The resulting coverage curves are generally very similar, but the quick version may deliver little spurious peaks occasionally. |
thr |
adjacent mean shift clusters are merged if their relative distance falls below this threshold. |
cluster |
if |
self |
An object of class |
smin |
Minimum coverage for bandwidth selection. Default: 1/3 for clustering, 2/3 for principal curves. |
plot.type |
If set to 0, no plotted output is given. Otherwise, an appropriate plot is provided, using the plotting type as specified. |
plot.segments |
A list with default |
print |
If TRUE, coverage values are printed on the screen as soon as
computed. This is quite helpful especially if |
x0 , way , scaled , pen , depth , control |
Auxiliary parameters as outlined in
|
... |
Optional graphical parameters passed to the corresponding plotting functions. |
Details
The function coverage.raw
computes the coverage, i.e. the
proportion of data points lying inside a circle or band with radius
\tau
, for a fixed value tau
. The whole coverage curve
C(\tau)
is constructed through function coverage
.
Functions coverage.raw
and coverage
can be used for any
object fitted by an unsupervised learning technique (for instance, HS principal curves, or even clustering
algorithms), while the functions prefixing with lpc.
and ms.
can only be
used for the corresponding objects. The functions lpc.coverage
and ms.coverage
are wrappers around
coverage
which operate directly a fitted object, rather
than a data matrix.
Function select.self.coverage
extracts suitable bandwidths from the
self-coverage curve, and produces a plot. The function is called from
within lpc.self.coverage
or ms.self.coverage
but can also be called directly by the user (for instance, if the graphical output is to be reproduced, or if
the minimum coverage smin
is to be modified). The component
$select
contains the selected candidate bandwidths, in the order
of strength of evidence provided by the self-coverage criterion (the
best bandwidth comes first, etc.). A plot is produced as a by-product,
which symbolizes the best bandwidth by a thick solid line, the
second-best by a dashed line, and the third-best by a dotted line. It is
recommended to run the self-coverage functions with fixed starting
points, as in the examples below, and to scale by the range only.
See Einbeck (2011) for details. Note that the original publication by Einbeck, Tutz, and Evers (2005) uses ‘quick’ coverage curves.
Value
A list of items, and a plot (unless plot.type=0
).
The functions lpc.self.coverage
and ms.self.coverage
produce an object of class
self
. The component $select
recommends suitable
bandwidths for the use in lpc
, in the order of strength of
evidence. These correspond to points of strong negative curvature (implemented via second
differences) of the self-coverage curve.
Author(s)
J. Einbeck
References
Einbeck, J., Tutz, G., & Evers, L. (2005). Local principal curves. Statistics and Computing 15, 301-313.
Einbeck, J. (2011). Bandwidth selection for mean-shift based unsupervised learning techniques: a unified approach via self-coverage. Journal of Pattern Recognition Research 6, 175-192.
See Also
Examples
data(faithful)
mfit <- ms(faithful)
coverage(mfit$data, mfit$cluster.center, gridsize=16)
f.self <- ms.self.coverage(faithful,gridsize= 50, taumin=0.1, taumax=0.5, plot.type="o")
h <- select.self.coverage(f.self)$select
mfit2 <- ms(faithful,h=h[2]) # using `second-best' suggested bandwidth
data(gvessel)
g.self <-lpc.self.coverage(gvessel[,c(2,4,5)], x0=c(35, 1870, 6.3), print=FALSE, plot.type=0)
h <- select.self.coverage(g.self)$select
g.lfit <- lpc(gvessel[,c(2,4,5)], h=h[1], x0=c(35, 1870, 6.3))
lpc.coverage(g.lfit, gridsize=10, print=FALSE)