duplex {prospectr} | R Documentation |
DUPLEX algorithm for calibration sampling
Description
Select calibration samples from a large multivariate data using the DUPLEX algorithm
Usage
duplex(X,
k,
metric = c("mahal", "euclid"),
pc,
group,
.center = TRUE,
.scale = FALSE)
Arguments
X |
a numeric matrix. |
k |
the number of calibration/validation samples. |
metric |
the distance metric to be used: 'euclid' (Euclidean distance) or 'mahal' (Mahalanobis distance, default). |
pc |
optional. The number of Principal Components to be used to select
the samples. If not specified, distance are computed in the Euclidean space.
Alternatively, distances are computed in the principal component space and
|
group |
An optional |
.center |
logical value indicating whether the input matrix must be
centered before projecting |
.scale |
logical value indicating whether the input matrix must be
scaled before |
Details
The DUPLEX algorithm is similar to the Kennard-Stone algorithm (see
kenStone
) but allows to select both calibration and validation
points that are independent. Similarly to the Kennard-Stone algorithm,
it starts by selecting the pair of points that are the farthest apart. They
are assigned to the calibration sets and removed from the list of points.
Then, the next pair of points which are farthest apart are assigned to the
validation sets and removed from the list. In a third step, the procedure
assigns each remaining point alternatively to the calibration
and validation sets based on the distance to the points already selected.
Similarly to the Kennard-Stone algorithm, the default distance metric used
by the procedure is the Euclidean distance, but the Mahalanobis distance can
be used as well using the pc
argument (see kenStone
).
Value
a list
with components:
'
model
': numeric vector giving the row indices of the input data selected for calibration'
test
': numeric vector giving the row indices of the input data selected for validation'
pc
': if thepc
argument is specified, a numeric matrix of the scaled pc scores
Author(s)
Antoine Stevens & Leonardo Ramirez-Lopez
References
Kennard, R.W., and Stone, L.A., 1969. Computer aided design of experiments. Technometrics 11, 137-148.
Snee, R.D., 1977. Validation of regression models: methods and examples. Technometrics 19, 415-428.
See Also
kenStone
, honigs
, shenkWest
,
naes
Examples
data(NIRsoil)
sel <- duplex(NIRsoil$spc, k = 30, metric = "mahal", pc = .99)
plot(sel$pc[, 1:2], xlab = "PC1", ylab = "PC2")
points(sel$pc[sel$model, 1:2], pch = 19, col = 2) # points selected for calibration
points(sel$pc[sel$test, 1:2], pch = 18, col = 3) # points selected for validation
# Test on artificial data
X <- expand.grid(1:20, 1:20) + rnorm(1e5, 0, .1)
plot(X[, 1], X[, 2], xlab = "VAR1", ylab = "VAR2")
sel <- duplex(X, k = 25, metric = "mahal")
points(X[sel$model, ], pch = 19, col = 2) # points selected for calibration
points(X[sel$test, ], pch = 15, col = 3) # points selected for validation