hct_method_corr {HDDesign}R Documentation

Estimate PCC of HCT Classifiers via implementation of Monte Carlo simulations with correlated features.

Description

Determine the probability of correct classification (PCC) for studies employing high dimensional features for classification. Higher Criticisms Threshold (HCT) classifier is used to choose the p-value threshold for feature selection. In addition to the original HCT procedure by (Donoho and Jin 2009), two more procedures to choose p-value threshold have developed and implemented.

Usage

	hct_method_corr(mu0, p, m, n, hct, alpha0, nrep, p1 = 0.5, 
	ss = F, pcorr, chol.rho, sampling.p=0.5)

Arguments

mu0

The effect size of the important features.

p

The number of the features in total.

m

The number of the important features.

n

The total sample size for the two groups.

hct

The HCT procedure employed to choose the p-value threshold for feature selection. There are two valid choices (case sensitive): 1) hct_empirical, the HCT procedure originally proposed by (Donoho and Jin 2009); 2) hct_beta, an alternative HCT procedure which makes use of the beta distribution of the p-values under the null

alpha0

The proportion of the smallest p-values we will consider in the HCT algorithm.

nrep

The number of simulation replicates employed to compute the expected PCC and/or sensitivity and specificity.

p1

The prevalence of the group 1 in the population, default to 0.5.

ss

Boolean variable, default to FALSE. The TRUE value instruct the program to compute the sensitivity and the specificity of the classifier.

pcorr

Number of correlated features.

chol.rho

Cholesky decomposition of the covariance of the pcorr features that are correlated. It is assumed that the m important features are part of the pcorr correlated features.

sampling.p

The assumed proportion of group 1 samples in the training data; default of 0.5 assumes groups are equally represented regardless of p1.

Value

If ss=FALSE, the function returns the expected PCC. If ss=TRUE, the function returns a vector containing the expected PCC, sensitivity and specificity.

Author(s)

Meihua Wu <meihuawu@umich.edu> Brisa N. Sanchez <brisa@umich.edu> Peter X.K. Song <pxsong@umich.edu> Raymond Luu <raluu@umich.edu> Wen Wang <wangwen@umich.edu>

References

Donoho, D., and Jin, J. (2009). "Feature Selection by Higher Criticism Thresholding Achieves the Optimal Phase Diagram." Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences 367 (1906) (November 13): 4449-4470.

Examples

## Sigma_1 in the paper
#first block is pcorr x pcorr of compound symmetry
#other diagonal block is Identity; off diagonal blocks are 0
	pcorr=10  
	p=500
	rho.cs=.8
	#create first block
	rho= diag(c((1-rho.cs)*rep(1,pcorr),rep(1,p-pcorr)))+ matrix(c(rho.cs*
	rep(1,pcorr),rep(0,p-pcorr)), ncol=1) %*% c(rep(1,pcorr),rep(0,p-pcorr))
	chol.rho1.500=chol(rho[1:pcorr,1:pcorr])
	set.seed(1)
	hct_method_corr(mu0=0.4,p=500,m=10,n=80,hct=hct_beta,alpha0=0.5,nrep=10,
	p1=0.5,ss=TRUE,pcorr=pcorr,chol.rho=chol.rho1.500)
	#return: 0.6672256 0.6672256 0.6672256

[Package HDDesign version 1.1 Index]