cvpvs.wnn {pvclass} | R Documentation |
Cross-Validated P-Values (Weighted Nearest Neighbors)
Description
Computes cross-validated nonparametric p-values for the potential class memberships of the training data. The p-values are based on 'weighted nearest-neighbors'.
Usage
cvpvs.wnn(X, Y, wtype = c('linear', 'exponential'), W = NULL,
tau = 0.3, distance = c('euclidean', 'ddeuclidean',
'mahalanobis'), cova = c('standard', 'M', 'sym'))
Arguments
X |
matrix containing training observations, where each observation is a row vector. |
Y |
vector indicating the classes which the training observations belong to. |
wtype |
type of the weight function (see section 'Details' below). |
W |
vector of the (decreasing) weights (see section 'Details' below). |
tau |
parameter of the weight function. If |
distance |
the distance measure: |
cova |
estimator for the covariance matrix: |
Details
Computes cross-validated nonparametric p-values for the potential class memberships of the training data. Precisely, for each feature vector X[i,]
and each class b
the number PV[i,b]
is a p-value for the null hypothesis that Y[i]
equals b
.
This p-value is based on a permutation test applied to an estimated Bayesian likelihood ratio, using 'weighted nearest neighbors' with estimated prior probabilities N(b)/n
. Here N(b)
is the number of observations of class b
and n
is the total number of observations.
The (decreasing) weights for the observations can be either indicated with a n
dimensional vector W
or (if W = NULL
) one of the following weight functions can be used:
linear:
W_i = \max(1-\frac{i}{n}/\tau,0),
exponential:
W_i = (1-\frac{i}{n})^\tau.
If tau
is a vector, the program searches for the best tau
. To determine the best tau
for the p-value PV[i,b]
, the class label of the training observation X[i,]
is set temporarily to b
and then for all training observations with Y[j] != b
the sum of the weights of the observations belonging to class b
is computed. Then the tau
which minimizes the sum of these values is chosen.
If W = NULL
and tau = NULL
, tau
is set to seq(0.1,0.9,0.1)
if wtype = "l"
and to c(1,5,10,20)
if wtype = "e"
.
Value
PV
is a matrix containing the cross-validated p-values. Precisely, for each feature vector X[i,]
and each class b
the number PV[i,b]
is a p-value for the null hypothesis that Y[i] = b
.
If tau
is a vector or NULL
(and W = NULL
), PV
has an attribute "opt.tau"
, which is a matrix and opt.tau[i,b]
is the best tau
for observation X[i,]
and class b
(see section 'Details'). "opt.tau"
is used to compute the p-values.
Author(s)
Niki Zumbrunnen niki.zumbrunnen@gmail.com
Lutz Dümbgen lutz.duembgen@stat.unibe.ch
www.imsv.unibe.ch/duembgen/index_ger.html
References
Zumbrunnen N. and Dümbgen L. (2017) pvclass: An R Package for p Values for Classification. Journal of Statistical Software 78(4), 1–19. doi:10.18637/jss.v078.i04
Dümbgen L., Igl B.-W. and Munk A. (2008) P-Values for Classification. Electronic Journal of Statistics 2, 468–493, available at http://dx.doi.org/10.1214/08-EJS245.
Zumbrunnen N. (2014) P-Values for Classification – Computational Aspects and Asymptotics. Ph.D. thesis, University of Bern, available at http://boris.unibe.ch/id/eprint/53585.
See Also
cvpvs, cvpvs.gaussian, cvpvs.knn, cvpvs.logreg
Examples
X <- iris[, 1:4]
Y <- iris[, 5]
cvpvs.wnn(X, Y, wtype = 'l', tau = 0.5)