cvpvs.logreg {pvclass}R Documentation

Cross-Validated P-Values (Penalized Multicategory Logistic Regression)

Description

Computes cross-validated nonparametric p-values for the potential class memberships of the training data. The p-values are based on 'penalized logistic regression'.

Usage

cvpvs.logreg(X, Y, tau.o=10, find.tau=FALSE, delta=2, tau.max=80, tau.min=1,
             pen.method = c("vectors", "simple", "none"), progress = TRUE)

Arguments

X

matrix containing training observations, where each observation is a row vector.

Y

vector indicating the classes which the training observations belong to.

tau.o

the penalty parameter (see section 'Details' below).

find.tau

logical. If TRUE the program searches for the best tau. For more information see section 'Details'.

delta

factor for the penalty parameter. Should be greater than 1. Only needed if find.tau == TRUE.

tau.max

maximal penalty parameter considered. Only needed if find.tau == TRUE.

tau.min

minimal penalty parameter considered. Only needed if find.tau == TRUE.

pen.method

the method of penalization (see section 'Details' below).

progress

optional parameter for reporting the status of the computations.

Details

Computes cross-validated nonparametric p-values for the potential class memberships of the training data. Precisely, for each feature vector X[i,] and each class b the number PV[i,b] is a p-value for the null hypothesis that Y[i] equals b, based on the remaining training observations.
This p-value is based on a permutation test applied to an estimated Bayesian likelihood ratio, using 'penalized logistic regression'. This means, the conditional probability of Y = y, given X = x, is assumed to be proportional to exp(a_y + b_y^T x). The parameters a_y, b_y are estimated via penalized maximum log-likelihood. The penalization is either a weighted sum of the euclidean norms of the vectors (b_1[j],b_2[j],\ldots,b_L[j]) (pen.method=='vectors') or a weighted sum of all moduli |b_y[j]| (pen.method=='simple'). The weights are given by tau.o times the sample standard deviation (within groups) of the j-th components of the feature vectors. In case of pen.method=='none', no penalization is used, but this option may be unstable.
If find.tau == TRUE, the program searches for the best penalty parameter. To determine the best parameter tau for the p-value PV[i,b], the class label of the training observation X[i,] is set temporarily to b and then for all training observations with Y[j] != b the estimated probability of X[j,] belonging to class b is computed. Then the tau which minimizes the sum of these values is chosen. First, tau.o is compared with tau.o*delta. If tau.o*delta is better, it is compared with tau.o*delta^2, etc. The maximal parameter considered is tau.max. If tau.o is better than tau.o*delta, it is compared with tau.o*delta^-1, etc. The minimal parameter considered is tau.min.

Value

PV is a matrix containing the cross-validated p-values. Precisely, for each feature vector X[i,] and each class b the number PV[i,b] is a p-value for the null hypothesis that Y[i] = b, based on the remaining training observations.
If find.tau == TRUE, PV has an attribute "tau.opt", which is a matrix and tau.opt[i,b] is the best tau for observation X[i,] and class b (see section 'Details'). tau.opt[i,b] is used to compute the p-value for observation X[i,] and class b.

Author(s)

Niki Zumbrunnen niki.zumbrunnen@gmail.com
Lutz Dümbgen lutz.duembgen@stat.unibe.ch
www.imsv.unibe.ch/duembgen/index_ger.html

References

Zumbrunnen N. and Dümbgen L. (2017) pvclass: An R Package for p Values for Classification. Journal of Statistical Software 78(4), 1–19. doi:10.18637/jss.v078.i04

Dümbgen L., Igl B.-W. and Munk A. (2008) P-Values for Classification. Electronic Journal of Statistics 2, 468–493, available at http://dx.doi.org/10.1214/08-EJS245.

Zumbrunnen N. (2014) P-Values for Classification – Computational Aspects and Asymptotics. Ph.D. thesis, University of Bern, available at http://boris.unibe.ch/id/eprint/53585.

See Also

cvpvs, cvpvs.gaussian, cvpvs.knn, cvpvs.wnn

Examples

## Not run: 
X <- iris[, 1:4]
Y <- iris[, 5]

cvpvs.logreg(X, Y, tau.o=1, pen.method="vectors",progress=TRUE)

## End(Not run)

# A bigger data example: Buerk's hospital data.
## Not run: 
data(buerk)
X.raw <- as.matrix(buerk[,1:21])
Y.raw <- buerk[,22]
n0.raw <- sum(1 - Y.raw)
n1 <- sum(Y.raw)
n0 <- 3*n1

X0 <- X.raw[Y.raw==0,]
X1 <- X.raw[Y.raw==1,]

tmpi0 <- sample(1:n0.raw,size=n0,replace=FALSE)
tmpi1 <- sample(1:n1    ,size=n1,replace=FALSE)

X <- rbind(X0[tmpi0,],X1)
Y <- c(rep(1,n0),rep(2,n1))

str(X)
str(Y)

PV <- cvpvs.logreg(X,Y,
	tau.o=5,pen.method="v",progress=TRUE)

analyze.pvs(Y=Y,pv=PV,pvplot=FALSE)

## End(Not run)


[Package pvclass version 1.4 Index]