multiclass routines {regtools}R Documentation

Classification with More Than 2 Classes

Description

Tools for multiclass classification, parametric and nonparametric.

Usage

avalogtrn(trnxy,yname)
ovaknntrn(trnxy,yname,k,xval=FALSE)
avalogpred()
classadjust(econdprobs,wrongprob1,trueprob1) 
boundaryplot(y01,x,regests,pairs=combn(ncol(x),2),pchvals=2+y01,cex=0.5,band=0.10)

Arguments

pchvals

Point size in base-R graphics.

trnxy

Data matrix, Y last.

xval

If TRUE, use leaving-one-out method.

y01

Y vector (1s and 0s).

regests

Estimated regression function values.

x

X data frame or matrix.

pairs

Two-row matrix, column i of which is a pair of predictor variables to graph.

cex

Symbol size for plotting.

band

If band is non-NULL, only points within band, say 0.1, of est. P(Y = 1) are displayed, for a contour-like effect.

yname

Name of the Y column.

k

Number of nearest neighbors.

econdprobs

Estimated conditional class probabilities, given the predictors.

wrongprob1

Incorrect, data-provenanced, unconditional P(Y = 1).

trueprob1

Correct unconditional P(Y = 1).

Details

These functions aid classification in the multiclass setting.

The function boundaryplot serves as a visualization technique, for the two-class setting. It draws the boundary between predicted Y = 1 and predicted Y = 0 data points in 2-dimensional feature space, as determined by the argument regests. Used to visually assess goodness of fit, typically running this function twice, say one for glm then for kNN. If there is much discrepancy and the analyst wishes to still use glm(), he/she may wish to add polynomial terms.

The functions not listed above are largely deprecated, e.g. in favor of qeLogit and the other qe-series functions.

Author(s)

Norm Matloff

Examples


## Not run: 


data(oliveoils) 
oo <- oliveoils[,-1] 

# toy example
set.seed(9999)
x <- runif(25)
y <- sample(0:2,25,replace=TRUE)
xd <- preprocessx(x,2,xval=FALSE)
kout <- ovaknntrn(y,xd,m=3,k=2)
kout$regest  # row 2:  0.0,0.5,0.5
predict(kout,predpts=matrix(c(0.81,0.55,0.15),ncol=1))  # 0,2,0or2
yd <- factorToDummies(as.factor(y),'y',FALSE)
kNN(x,yd,c(0.81,0.55,0.15),2)  # predicts 0, 1or2, 2

data(peDumms)  # prog/engr data 
ped <- peDumms[,-33] 
ped <- as.matrix(ped)
x <- ped[,-(23:28)]
y <- ped[,23:28]
knnout <- kNN(x,y,x,25,leave1out=TRUE) 
truey <- apply(y,1,which.max) - 1
mean(knnout$ypreds == truey)  # about 0.37
xd <- preprocessx(x,25,xval=TRUE)
kout <- knnest(y,xd,25)
preds <- predict(kout,predpts=x)
hats <- apply(preds,1,which.max) - 1
mean(yhats == truey)  # about 0.37

data(peFactors)
# discard the lower educ-level cases, which are rare
edu <- peFactors$educ 
numedu <- as.numeric(edu) 
idxs <- numedu >= 12 
pef <- peFactors[idxs,]
numedu <- numedu[idxs]
pef$educ <- as.factor(numedu)
pef1 <- pef[,c(1,3,5,7:9)]

# ovalog
ovaout <- ovalogtrn(pef1,"occ")
preds <- predict(ovaout,predpts=pef1[,-3])
mean(preds == factorTo012etc(pef1$occ))  # about 0.39

# avalog

avaout <- avalogtrn(pef1,"occ")  
preds <- predict(avaout,predpts=pef1[,-3]) 
mean(preds == factorTo012etc(pef1$occ))  # about 0.39 

# knn

knnout <- ovalogtrn(pef1,"occ",25)
preds <- predict(knnout,predpts=pef1[,-3])
mean(preds == factorTo012etc(pef1$occ))  # about 0.43

data(oliveoils)
oo <- oliveoils
oo <- oo[,-1]
knnout <- ovaknntrn(oo,'Region',10)
# predict a new case that is like oo1[1,] but with palmitic = 950
newx <- oo[1,2:9,drop=FALSE]
newx[,1] <- 950
predict(knnout,predpts=newx)  # predicts class 2, South


## End(Not run)


[Package regtools version 1.7.0 Index]