naes {prospectr} | R Documentation |
k-means sampling
Description
Perform a k-means sampling on a matrix for multivariate calibration
Usage
naes(X, k, pc, iter.max = 10, method = 0, .center = TRUE, .scale = FALSE)
Arguments
X |
a numeric matrix (optionally a data frame that can be coerced to a numerical matrix). |
k |
either the number of calibration samples to select or a set of cluster centres to initiate the k-means clustering. |
pc |
optional. If not specified, k-means is run directly on the variable
(Euclidean) space.
Alternatively, a PCA is performed before k-means and |
iter.max |
maximum number of iterations allowed for the k-means
clustering. Default is |
method |
the method used for selecting calibration samples within each
cluster: either samples closest to the cluster.
centers ( |
.center |
logical value indicating whether the input matrix must be
centered before Principal Component Analysis. Default set to |
.scale |
logical value indicating whether the input matrix must be
scaled before Principal Component Analysis. Default set to |
Details
K-means sampling is a simple procedure based on cluster analysis to select calibration samples from large multivariate datasets. The method can be described in three points (Naes et al.,2001):
Perform a PCA and decide how many principal component to keep,
Carry out a k-means clustering on the principal component scores and choose the number of resulting clusters to be equal to the number of desired calibration samples,
Select one sample from each cluster.
Value
a list with components:
'
model
': numeric vector giving the row indices of the input data selected for calibration'
test
': numeric vector giving the row indices of the remaining observations'
pc
': if thepc
argument is specified, a numeric matrix of the scaled pc scores'
cluster
': integer vector indicating the cluster to which each point was assigned'
centers
': a matrix of cluster centres
Author(s)
Antoine Stevens & Leonardo Ramirez-Lopez
References
Naes, T., 1987. The design of calibration in near infra-red reflectance analysis by clustering. Journal of Chemometrics 1, 121-134.
Naes, T., Isaksson, T., Fearn, T., and Davies, T., 2002. A user friendly guide to multivariate calibration and classification. NIR Publications, Chichester, United Kingdom.
See Also
kenStone
, honigs
, duplex
,
shenkWest
Examples
data(NIRsoil)
sel <- naes(NIRsoil$spc, k = 5, p = .99, method = 0)
# clusters
plot(sel$pc[, 1:2], col = sel$cluster + 2)
# points selected for calibration with method = 0
points(sel$pc[sel$model, 1:2],
col = 2,
pch = 19,
cex = 1
)
# pre-defined centers can also be provided
sel2 <- naes(NIRsoil$spc,
k = sel$centers,
p = .99, method = 1
)
# points selected for calibration with method = 1
points(sel$pc[sel2$model, 1:2],
col = 1,
pch = 15,
cex = 1
)