CVd {bestglm} | R Documentation |
The delete-d method for cross-validation uses a random sample of d observations as the validation sample. This is repeated many times.
CVd(X, y, d = ceiling(n * (1 - 1/(log(n) - 1))), REP = 100, family = gaussian, ...)
X |
training inputs |
y |
training output |
d |
size of validation sample |
REP |
number of replications |
family |
glm family |
... |
optional arguments passed to |
Shao (1993, 1997) suggested the delete-d algorithm implemented in this function.
In this algorithm, a random sample of d observations are taken as the validation
sample.
This random sampling is repeated REP
times.
Shao (1997, p.234, eqn. 4.5 and p.236) suggests d= n(1-1/(log n - 1)),
This is obtained by taking λ_n = log n on page 236 (Shao, 1997).
As shown in the table Shao's recommended choice of the d parameter corresponds
to validation samples that are typically much larger that used in 10-fold or
5-fold
cross-validation. LOOCV corresponds to d=1 only!
n | d | K=10 | K=5 |
50 | 33 | 5 | 10 |
100 | 73 | 10 | 20 |
200 | 154 | 20 | 40 |
500 | 405 | 50 | 100 |
1000 | 831 | 100 | 200 |
Vector of two components comprising the cross-validation MSE and its sd based on the MSE in each validation sample.
A.I. McLeod and C. Xu
Shao, Jun (1993). Linear Model Selection by Cross-Validation. Journal of the American Statistical Assocation 88, 486-494.
Shao, Jun (1997). An Asymptotic Theory for Linear Model Selection. Statistica Sinica 7, 221-264.
#Example 1. delete-d method #For the training set, n=67. So 10-fold CV is like using delete-d #with d=7, approximately. data(zprostate) train<-(zprostate[zprostate[,10],])[,-10] X<-train[,1:2] y<-train[,9] set.seed(123321123) CVd(X, y, d=7, REP=10) #should set to 1000. Used 10 to save time in example.