hct_method {HDDesign} | R Documentation |
Estimate PCC of HCT Classifiers
Description
Determine the probability of correct classification (PCC) for studies employing high dimensional features for classification. It is assumed that a Higher Criticism Threshold (HCT) is used to choose the p-value threshold for feature selection and that features meeting the threshold are regarded as important for classification. A linear combination of important features is assumed to form the classification rule, with all important features having equal weight. In addition to the original HCT procedure by Donoho and Jin (2009), two more procedures to choose p-value threshold have developed and implemented. This function generates a fraction (alpha0) of the smallest p-values, calculates the threshold, examines which p-values meet the p-value threshold, and uses the normal CDF to estimate the PCC of the classifier. Neither training nor testing data are used. (See Sanchez et al 2016.)
Usage
hct_method(mu0, p, m, n, hct, alpha0, nrep, p1 = 0.5, ss = F, sampling.p=0.5)
Arguments
mu0 |
The effect size of the important features. |
p |
The number of the features in total. |
m |
The number of the important features. |
n |
The total sample size for the two groups. |
hct |
The HCT procedure employed to choose the p-value threshold for feature selection. There are two valid choices (case sensitive): 1) hct_empirical, the HCT procedure originally proposed by (Donoho and Jin 2009); 2) hct_beta, an alternative HCT procedure which makes use of the beta distribution of the p-values under the null; |
alpha0 |
The proportion of the smallest p-values we will consider in the HCT algorithm, typically 0.1. |
nrep |
The number of simulation replicates employed to compute the expected PCC and/or sensitivity and specificity. |
p1 |
The prevalence of the group 1 in the population, default to 0.5. |
ss |
Boolean variable, default to FALSE. The TRUE value instruct the program to compute the sensitivity and the specificity of the classifier. |
sampling.p |
The assumed proportion of group 1 samples in the training data; default of 0.5 assumes groups are equally represented regardless of p1. |
Value
If ss=FALSE, the function returns the expected PCC. If ss=TRUE, the function returns a vector containing the expected PCC, sensitivity and specificity.
Author(s)
Meihua Wu <meihuawu@umich.edu> Brisa N. Sanchez <brisa@umich.edu> Peter X.K. Song <pxsong@umich.edu> Raymond Luu <raluu@umich.edu> Wen Wang <wangwen@umich.edu>
References
Donoho, D, and Jin, J. (2009). "Feature Selection by Higher Criticism Thresholding Achieves the Optimal Phase Diagram." Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences 367 (1906): 4449-4470.
Sanchez, B.N., Wu, M., Song, P.X.K., and Wang W. (2016). "Study design in high-dimensional classification analysis." Biostatistics, in press.
Examples
set.seed(1)
hct_method(mu0=0.4, p=500, m=10, n=80, hct=hct_beta, alpha0=0.5, nrep=10,
p1 = 0.5, ss = TRUE)
#return: 0.807098 0.807098 0.807098