| multiclass routines {regtools} | R Documentation |
Classification with More Than 2 Classes
Description
Tools for multiclass classification, parametric and nonparametric.
Usage
avalogtrn(trnxy,yname)
ovaknntrn(trnxy,yname,k,xval=FALSE)
avalogpred()
classadjust(econdprobs,wrongprob1,trueprob1)
boundaryplot(y01,x,regests,pairs=combn(ncol(x),2),pchvals=2+y01,cex=0.5,band=0.10)
Arguments
pchvals |
Point size in base-R graphics. |
trnxy |
Data matrix, Y last. |
xval |
If TRUE, use leaving-one-out method. |
y01 |
Y vector (1s and 0s). |
regests |
Estimated regression function values. |
x |
X data frame or matrix. |
pairs |
Two-row matrix, column i of which is a pair of predictor variables to graph. |
cex |
Symbol size for plotting. |
band |
If |
yname |
Name of the Y column. |
k |
Number of nearest neighbors. |
econdprobs |
Estimated conditional class probabilities, given the predictors. |
wrongprob1 |
Incorrect, data-provenanced, unconditional P(Y = 1). |
trueprob1 |
Correct unconditional P(Y = 1). |
Details
These functions aid classification in the multiclass setting.
The function boundaryplot serves as a visualization technique,
for the two-class setting. It draws the boundary between predicted Y =
1 and predicted Y = 0 data points in 2-dimensional feature space, as
determined by the argument regests. Used to visually assess
goodness of fit, typically running this function twice, say one for
glm then for kNN. If there is much discrepancy and the
analyst wishes to still use glm(), he/she may wish to add polynomial
terms.
The functions not listed above are largely deprecated, e.g. in favor of
qeLogit and the other qe-series functions.
Author(s)
Norm Matloff
Examples
## Not run:
data(oliveoils)
oo <- oliveoils[,-1]
# toy example
set.seed(9999)
x <- runif(25)
y <- sample(0:2,25,replace=TRUE)
xd <- preprocessx(x,2,xval=FALSE)
kout <- ovaknntrn(y,xd,m=3,k=2)
kout$regest # row 2: 0.0,0.5,0.5
predict(kout,predpts=matrix(c(0.81,0.55,0.15),ncol=1)) # 0,2,0or2
yd <- factorToDummies(as.factor(y),'y',FALSE)
kNN(x,yd,c(0.81,0.55,0.15),2) # predicts 0, 1or2, 2
data(peDumms) # prog/engr data
ped <- peDumms[,-33]
ped <- as.matrix(ped)
x <- ped[,-(23:28)]
y <- ped[,23:28]
knnout <- kNN(x,y,x,25,leave1out=TRUE)
truey <- apply(y,1,which.max) - 1
mean(knnout$ypreds == truey) # about 0.37
xd <- preprocessx(x,25,xval=TRUE)
kout <- knnest(y,xd,25)
preds <- predict(kout,predpts=x)
hats <- apply(preds,1,which.max) - 1
mean(yhats == truey) # about 0.37
data(peFactors)
# discard the lower educ-level cases, which are rare
edu <- peFactors$educ
numedu <- as.numeric(edu)
idxs <- numedu >= 12
pef <- peFactors[idxs,]
numedu <- numedu[idxs]
pef$educ <- as.factor(numedu)
pef1 <- pef[,c(1,3,5,7:9)]
# ovalog
ovaout <- ovalogtrn(pef1,"occ")
preds <- predict(ovaout,predpts=pef1[,-3])
mean(preds == factorTo012etc(pef1$occ)) # about 0.39
# avalog
avaout <- avalogtrn(pef1,"occ")
preds <- predict(avaout,predpts=pef1[,-3])
mean(preds == factorTo012etc(pef1$occ)) # about 0.39
# knn
knnout <- ovalogtrn(pef1,"occ",25)
preds <- predict(knnout,predpts=pef1[,-3])
mean(preds == factorTo012etc(pef1$occ)) # about 0.43
data(oliveoils)
oo <- oliveoils
oo <- oo[,-1]
knnout <- ovaknntrn(oo,'Region',10)
# predict a new case that is like oo1[1,] but with palmitic = 950
newx <- oo[1,2:9,drop=FALSE]
newx[,1] <- 950
predict(knnout,predpts=newx) # predicts class 2, South
## End(Not run)