sampks {rchemo}R Documentation

Kennard-Stone sampling

Description

The function divides the data X in two sets, "train" vs "test", using the Kennard-Stone (KS) algorithm (Kennard & Stone, 1969). The two sets correspond to two different underlying probability distributions: set "train" has higher dispersion than set "test".

Usage


sampks(X, k, diss = c("eucl", "mahal"))

Arguments

X

X-data (n, p) to be sampled.

k

An integer defining the number of training observations to select.

diss

The type of dissimilarity used for selecting the observations in the algorithm. Possible values are "eucl" (default; Euclidean distance) or "mahal" (Mahalanobis distance).

Value

train

Indexes (i.e. row numbers in X) of the selected observations, for the training set.

test

Indexes (i.e. row numbers in X) of the selected observations, for the test set.

References

Kennard, R.W., Stone, L.A., 1969. Computer aided design of experiments. Technometrics, 11(1), 137-148.

Examples


n <- 10 ; p <- 3
X <- matrix(rnorm(n * p), ncol = p)

k <- 7
sampks(X, k = k)  

n <- 10 ; k <- 25
X <- expand.grid(1:n, 1:n)
X <- X + rnorm(nrow(X) * ncol(X), 0, .1)
s <- sampks(X, k)$train 
plot(X)
points(X[s, ], pch = 19, col = 2, cex = 1.5)


[Package rchemo version 0.1-1 Index]