R: Duplex sampling

sampdp {rchemo}

R Documentation

Duplex sampling

Description

The function divides the data X in two sets, "train" vs "test", using the Duplex algorithm (Snee, 1977). The two sets are of equal size. If needed, the user can add a posteriori the eventual remaining observations (not in "train" nor "test") to "train".

Usage


sampdp(X, k, diss = c("eucl", "mahal"))

Arguments

`X`	X-data (`n, p`) to be sampled.
`k`	An integer defining the number of training observations to select. Must be <= `n / 2`.
`diss`	The type of dissimilarity used for selecting the observations in the algorithm. Possible values are "eucl" (default; Euclidean distance) or "mahal" (Mahalanobis distance).

Value

`train`	Indexes (i.e. row numbers in `X`) of the selected observations, for the training set.
`test`	Indexes (i.e. row numbers in `X`) of the selected observations, for the test set.
`remain`	Indexes (i.e., row numbers in `X`) of the remaining observations.

References

Kennard, R.W., Stone, L.A., 1969. Computer aided design of experiments. Technometrics, 11(1), 137-148.

Snee, R.D., 1977. Validation of Regression Models: Methods and Examples. Technometrics 19, 415-428. https://doi.org/10.1080/00401706.1977.10489581

Examples


n <- 10 ; p <- 3
X <- matrix(rnorm(n * p), ncol = p)

k <- 4
sampdp(X, k = k)
sampdp(X, k = k, diss = "mahal")

[Package rchemo version 0.1-2 Index]