| zprostate {bestglm} | R Documentation |
Prostate cancer data. Standardized.
Description
Data with 8 inputs and one output used to illustrate the prediction problem and regression in the textbook of Hastie, Tibshirani and Freedman (2009).
Usage
data(zprostate)
Format
A data frame with 97 observations, 9 inputs and 1 output. All input variables have been standardized.
lcavollog-cancer volume
lweightlog prostate weight
ageage in years
lbphlog benign prostatic hyperplasia
sviseminal vesicle invasion
lcplog of capsular penetration
gleasonGleason score
pgg45percent of Gleascores 4/5
lpsaOutcome. Log of PSA
trainTRUE or FALSE
Details
A study of 97 men with prostate cancer examined the correlation between PSA (prostate specific antigen) and a number of clinical measurements: lcavol, lweight, lbph, svi, lcp, gleason, pgg45
References
Hastie, Tibshirani & Friedman. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd Ed. Springer.
Examples
#Prostate data. Table 3.3 HTF.
data(zprostate)
#full dataset
trainQ<-zprostate[,10]
train <-zprostate[trainQ,-10]
test <-zprostate[!trainQ,-10]
ans<-lm(lpsa~., data=train)
sig<-summary(ans)$sigma
yHat<-predict(ans, newdata=test)
yTest<-zprostate$lpsa[!trainQ]
TE<-mean((yTest-yHat)^2)
#subset
ansSub<-bestglm(train, IC="BICq")$BestModel
sigSub<-summary(ansSub)$sigma
yHatSub<-predict(ansSub, newdata=test)
TESub<-mean((yTest-yHatSub)^2)
m<-matrix(c(TE,sig,TESub,sigSub), ncol=2)
dimnames(m)<-list(c("TestErr","Sd"),c("LS","Best"))
m