Shao {bestglm} | R Documentation |
Data a simulation study reported by Shao (1993, Table 1). The linear regression model Shao (1993, Table 2) reported 4 simulation experiments using 4 different values for the regression coefficients:
y = 2 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_4 + \beta_5 x_5 + e,
where e
is an independent normal error with unit variance.
The four regression coefficients for the four experiments are shown in the table below,
Experiment | \beta_2
| \beta_3
| \beta_4
| \beta_5 |
1 | 0 | 0 | 4 | 0 |
2 | 0 | 0 | 4 | 8 |
3 | 9 | 0 | 4 | 8 |
4 | 9 | 6 | 4 | 8 |
The table below summarizes the probability of correct model selection in the experiment reported by Shao (1993, Table 2). Three model selection methods are compared: LOOCV (leave-one-out CV), CV(d=25) or the delete-d method with d=25 and APCV which is a very efficient computation CV method but specialized to the case of linear regression.
Experiment | LOOCV | CV(d=25) | APCV |
1 | 0.484 | 0.934 | 0.501 |
2 | 0.641 | 0.947 | 0.651 |
3 | 0.801 | 0.965 | 0.818 |
4 | 0.985 | 0.948 | 0.999 |
The CV(d=25) outperforms LOOCV in all cases and it also outforms APCV by a large margin in Experiments 1, 2 and 3 but in case 4 APCV is slightly better.
data(Shao)
A data frame with 40 observations on the following 4 inputs.
x2
a numeric vector
x3
a numeric vector
x4
a numeric vector
x5
a numeric vector
Shao, Jun (1993). Linear Model Selection by Cross-Validation. Journal of the American Statistical Assocation 88, 486-494.
#In this example BICq(q=0.25) selects the correct model but BIC does not
data(Shao)
X<-as.matrix.data.frame(Shao)
b<-c(0,0,4,0)
set.seed(123321123)
#Note: matrix multiplication must be escaped in Rd file
y<-X%*%b+rnorm(40)
Xy<-data.frame(Shao, y=y)
bestglm(Xy)
bestglm(Xy, IC="BICq")