sampdp {rchemo}R Documentation

Duplex sampling

Description

The function divides the data X in two sets, "train" vs "test", using the Duplex algorithm (Snee, 1977). The two sets are of equal size. If needed, the user can add a posteriori the eventual remaining observations (not in "train" nor "test") to "train".

Usage


sampdp(X, k, diss = c("eucl", "mahal"))

Arguments

X

X-data (n, p) to be sampled.

k

An integer defining the number of training observations to select. Must be <= n / 2.

diss

The type of dissimilarity used for selecting the observations in the algorithm. Possible values are "eucl" (default; Euclidean distance) or "mahal" (Mahalanobis distance).

Value

train

Indexes (i.e. row numbers in X) of the selected observations, for the training set.

test

Indexes (i.e. row numbers in X) of the selected observations, for the test set.

remain

Indexes (i.e., row numbers in X) of the remaining observations.

References

Kennard, R.W., Stone, L.A., 1969. Computer aided design of experiments. Technometrics, 11(1), 137-148.

Snee, R.D., 1977. Validation of Regression Models: Methods and Examples. Technometrics 19, 415-428. https://doi.org/10.1080/00401706.1977.10489581

Examples


n <- 10 ; p <- 3
X <- matrix(rnorm(n * p), ncol = p)

k <- 4
sampdp(X, k = k)
sampdp(X, k = k, diss = "mahal")


[Package rchemo version 0.1-1 Index]