R: Kennard-Stone sampling

sampks {rchemo}

R Documentation

Kennard-Stone sampling

Description

The function divides the data X in two sets, "train" vs "test", using the Kennard-Stone (KS) algorithm (Kennard & Stone, 1969). The two sets correspond to two different underlying probability distributions: set "train" has higher dispersion than set "test".

Usage


sampks(X, k, diss = c("eucl", "mahal"))

Arguments

`X`	X-data (`n, p`) to be sampled.
`k`	An integer defining the number of training observations to select.
`diss`	The type of dissimilarity used for selecting the observations in the algorithm. Possible values are "eucl" (default; Euclidean distance) or "mahal" (Mahalanobis distance).

Value

`train`	Indexes (i.e. row numbers in `X`) of the selected observations, for the training set.
`test`	Indexes (i.e. row numbers in `X`) of the selected observations, for the test set.

References

Kennard, R.W., Stone, L.A., 1969. Computer aided design of experiments. Technometrics, 11(1), 137-148.

Examples


n <- 10 ; p <- 3
X <- matrix(rnorm(n * p), ncol = p)

k <- 7
sampks(X, k = k)  

n <- 10 ; k <- 25
X <- expand.grid(1:n, 1:n)
X <- X + rnorm(nrow(X) * ncol(X), 0, .1)
s <- sampks(X, k)$train 
plot(X)
points(X[s, ], pch = 19, col = 2, cex = 1.5)

[Package rchemo version 0.1-2 Index]