Rc {LPCM} | R Documentation |
Measuring goodness-of-fit for principal objects.
Description
These functions compute the ‘coverage coefficient’ R_C
for local principal curves, local principal points
(i.e., kernel density estimates obtained through iterated mean shift), and other principal objects.
Usage
Rc(x,...)
## S3 method for class 'lpc'
Rc(x,...)
## S3 method for class 'lpc.spline'
Rc(x,...)
## S3 method for class 'ms'
Rc(x,...)
base.Rc(data, closest.coords, type="curve")
Arguments
x |
an object used to select a method. |
... |
Further arguments passed to or from other methods (not needed yet). |
data |
A data matrix. |
closest.coords |
A matrix of coordinates of the projected data. |
type |
For principal curves, don't modify. For principal points, set "points". |
Details
Rc
computes the coverage coefficient R_C
, a quantity which
estimates the goodness-of-fit of a fitted principal object. This
quantity can be interpreted similar to the coefficient of determination in
regression analysis: Values close to 1 indicate a good fit, while values
close to 0 indicate a ‘bad’ fit (corresponding to linear PCA).
For objects of type lpc
, lpc.spline
, and ms
, S3 methods are available which use the generic function
Rc
. This, in turn, calls the base function base.Rc
, which
can also be used manually if the fitted object is of another class.
In principle, function base.Rc
can be used for assessing
goodness-of-fit of any principal object provided that
the coordinates (closest.coords
) of the projected data are
available. For instance, for HS principal curves fitted via
princurve
, this information is contained in component $s
,
and for a a k-means object, say fitk
, this information can be
obtained via fitk$centers[fitk$cluster,]
. Set type="points"
in
the latter case.
The function Rc
attempts to compute all missing information, so
computation will take the longer the less informative the given
object x
is. Note also, Rc
looks up the option scaled
in the fitted
object, and accounts for the scaling automatically. Important: If the data
were scaled, then do NOT unscale the results by hand in order to feed
the unscaled version into base.Rc
, this will give a wrong result.
In terms of methodology, these functions compute R_C
directly through the mean
reduction of absolute residual length, rather than through the
area above the coverage curve.
These functions do currently not account for observation
weights, i.e. R_C
is computed through the unweighted mean
reduction in absolute residual length (even if weights have been used for
the curve fitting).
In the clustering context, a value of R_C=0.8
means that,
after the clustering, the mean absolute residual length has been
reduced by 80\%
(compared to the distances to the overall mean).
Author(s)
J. Einbeck.
References
Einbeck, Tutz, and Evers (2005). Local principal curves. Statistics and Computing 15, 301-313.
Einbeck (2011). Bandwidth selection for nonparametric unsupervised learning techniques – a unified approach via self-coverage. Journal of Pattern Recognition Research 6, 175-192.
See Also
lpc.spline
, ms
, coverage
.
Examples
data(calspeedflow)
lpc1 <- lpc.spline(lpc(calspeedflow[,3:4]), project=TRUE)
Rc(lpc1)
# is the same as:
base.Rc(lpc1$lpcobject$data, lpc1$closest.coords)
ms1 <- ms(calspeedflow[,3:4], plot=FALSE)
Rc(ms1)
# is the same as:
base.Rc(ms1$data, ms1$cluster.center[ms1$closest.label,], type="points")