npc {nproc} | R Documentation |
Construct a Neyman-Pearson Classifier from a sample of class 0 and class 1.
Description
Given a type I error upper bound alpha and a violation upper bound delta, npc
calculates the Neyman-Pearson Classifier
which controls the type I error under alpha with probability at least 1-delta.
Usage
npc(x = NULL, y, method = c("logistic", "penlog", "svm", "randomforest",
"lda", "slda", "nb", "nnb", "ada", "tree"), alpha = 0.05, delta = 0.05,
split = 1, split.ratio = 0.5, n.cores = 1, band = FALSE,
nfolds = 10, randSeed = 0, warning = TRUE, ...)
Arguments
x |
n * p observation matrix. n observations, p covariates. |
y |
n 0/1 observatons. |
method |
base classification method.
|
alpha |
the desirable upper bound on type I error. Default = 0.05. |
delta |
the violation rate of the type I error. Default = 0.05. |
split |
the number of splits for the class 0 sample. Default = 1. For ensemble version, choose split > 1. |
split.ratio |
the ratio of splits used for the class 0 sample to train the
base classifier. The rest are used to estimate the threshold. Can also be set to be "adaptive", which will be determined using a data-driven method implemented in |
n.cores |
number of cores used for parallel computing. Default = 1. WARNING: windows machine is not supported. |
band |
whether to generate both lower and upper bounds of type II error. Default = FALSE. |
nfolds |
number of folds for performing adaptive split ratio selection. Default = 10. |
randSeed |
the random seed used in the algorithm. |
warning |
whether to show various warnings in the program. Default = TRUE. |
... |
additional arguments. |
Value
An object with S3 class npc.
fits |
a list of length max(1,split), represents the fit during each split. |
method |
the base classification method. |
split |
the number of splits used. |
References
Xin Tong, Yang Feng, and Jingyi Jessica Li (2018), Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristic (NP-ROC), Science Advances, 4, 2, eaao1659.
See Also
nproc
and predict.npc
Examples
set.seed(1)
n = 1000
x = matrix(rnorm(n*2),n,2)
c = 1+3*x[,1]
y = rbinom(n,1,1/(1+exp(-c)))
xtest = matrix(rnorm(n*2),n,2)
ctest = 1+3*xtest[,1]
ytest = rbinom(n,1,1/(1+exp(-ctest)))
##Use lda classifier and the default type I error control with alpha=0.05, delta=0.05
fit = npc(x, y, method = 'lda')
pred = predict(fit,xtest)
fit.score = predict(fit,x)
accuracy = mean(pred$pred.label==ytest)
cat('Overall Accuracy: ', accuracy,'\n')
ind0 = which(ytest==0)
typeI = mean(pred$pred.label[ind0]!=ytest[ind0]) #type I error on test set
cat('Type I error: ', typeI, '\n')
## Not run:
##Ensembled lda classifier with split = 11, alpha=0.05, delta=0.05
fit = npc(x, y, method = 'lda', split = 11)
pred = predict(fit,xtest)
accuracy = mean(pred$pred.label==ytest)
cat('Overall Accuracy: ', accuracy,'\n')
ind0 = which(ytest==0)
typeI = mean(pred$pred.label[ind0]!=ytest[ind0]) #type I error on test set
cat('Type I error: ', typeI, '\n')
##Now, change the method to logistic regression and change alpha to 0.1
fit = npc(x, y, method = 'logistic', alpha = 0.1)
pred = predict(fit,xtest)
accuracy = mean(pred$pred.label==ytest)
cat('Overall Accuracy: ', accuracy,'\n')
ind0 = which(ytest==0)
typeI = mean(pred$pred.label[ind0]!=ytest[ind0]) #type I error on test set
cat('Type I error: ', typeI, '\n')
##Now, change the method to adaboost
fit = npc(x, y, method = 'ada', alpha = 0.1)
pred = predict(fit,xtest)
accuracy = mean(pred$pred.label==ytest)
cat('Overall Accuracy: ', accuracy,'\n')
ind0 = which(ytest==0)
typeI = mean(pred$pred.label[ind0]!=ytest[ind0]) #type I error on test set
cat('Type I error: ', typeI, '\n')
##Now, try the adaptive splitting ratio
fit = npc(x, y, method = 'ada', alpha = 0.1, split.ratio = 'adaptive')
pred = predict(fit,xtest)
accuracy = mean(pred$pred.label==ytest)
cat('Overall Accuracy: ', accuracy,'\n')
ind0 = which(ytest==0)
typeI = mean(pred$pred.label[ind0]!=ytest[ind0]) #type I error on test set
cat('Type I error: ', typeI, '\n')
cat('Splitting ratio:', fit$split.ratio)
## End(Not run)