sampdp {rchemo} | R Documentation |
Duplex sampling
Description
The function divides the data in two sets, "train" vs "test", using the Duplex algorithm (Snee, 1977). The two sets are of equal size. If needed, the user can add
the eventual remaining observations (not in "train" nor "test") to "train".
Usage
sampdp(X, k, diss = c("eucl", "mahal"))
Arguments
X |
X-data ( |
k |
An integer defining the number of training observations to select. Must be <= |
diss |
The type of dissimilarity used for selecting the observations in the algorithm. Possible values are "eucl" (default; Euclidean distance) or "mahal" (Mahalanobis distance). |
Value
train |
Indexes (i.e. row numbers in |
test |
Indexes (i.e. row numbers in |
remain |
Indexes (i.e., row numbers in |
References
Kennard, R.W., Stone, L.A., 1969. Computer aided design of experiments. Technometrics, 11(1), 137-148.
Snee, R.D., 1977. Validation of Regression Models: Methods and Examples. Technometrics 19, 415-428. https://doi.org/10.1080/00401706.1977.10489581
Examples
n <- 10 ; p <- 3
X <- matrix(rnorm(n * p), ncol = p)
k <- 4
sampdp(X, k = k)
sampdp(X, k = k, diss = "mahal")