cv_method_MC_corr {HDDesign}R Documentation

MC simulation-based method to calculate the PCC of a CV-based classifier when features are correlated; uses training and testing datasets.

Description

Determine the probability of correct classification (PCC) for a high dimensional classification study employing Cross validation classifier. This is similar to cv_method_MC, but instead features generated are correlated.

Usage

	cv_method_MC_corr(mu0, p, m, n, alpha_list, nrep, p1 = 0.5, ss = F, ntest, 
	pcorr, chol.rho,sampling.p=0.5)

Arguments

mu0

The effect size of the important features.

p

The number of the features in total.

m

The number of the important features.

n

The total sample size for the two groups.

alpha_list

The search grid for the p-value threshold.

nrep

The number of simulation replicates employed to compute the expected PCC and/or sensitivity and specificity.

p1

The prevalence of the group 1 in the population, default to 0.5.

ss

Boolean variable, default to FALSE. The TRUE value instruct the program to compute the sensitivity and the specificity of the classifier.

ntest

Sample size for the test dataset.

pcorr

Number of correlated features.

chol.rho

Cholesky decomposition of the covariance of the pcorr features that are correlated. It is assumed that the m important features are part of the pcorr correlated features.

sampling.p

The assumed proportion of group 1 samples in the training data; default of 0.5 assumes groups are equally represented regardless of p1.

Details

Refer to Sanchez, Wu, Song, Wang 2016, supplementary materials. This function is used to verify if a study using the sample sizes in Table 1 of the manuscript attains the PCC target via MC simulations.

Value

If ss=FALSE, the function returns the expected PCC. If ss=TRUE, the function returns a vector containing the expected PCC, sensitivity and specificity.

Author(s)

Meihua Wu <meihuawu@umich.edu> Brisa N. Sanchez <brisa@umich.edu> Peter X.K. Song <pxsong@umich.edu> Raymond Luu <raluu@umich.edu> Wen Wang <wangwen@umich.edu>

References

Sanchez, B.N., Wu, M., Song, P.X.K., and Wang W. (2016). "Study design in high-dimensional classification analysis." Biostatistics, in press.

Examples

	## Sigma_1 in the paper
	#first block is pcorr x pcorr of compound symmetry
	#other diagonal block is Identity; off diagonal blocks are 0
	
	pcorr=10  
	p=500
	rho.cs=.8
	
	#create first block
	rho=matrix(rep(0,p^2),nrow=p)
	rho[1:pcorr,1:pcorr]=rho.cs
	diag(rho)=rep(1,p)
	
	chol.rho1.500=chol(rho[1:pcorr,1:pcorr])
	
	set.seed(1)
	cv_method_MC_corr(mu0=0.4,p=500,m=10,n=80,alpha_list=c(0.0000001,0.0001,0.01),
	nrep=10,p1=0.6,ss=TRUE,ntest=100,pcorr=10,chol.rho=chol.rho1.500)
	#return: 0.623 0.670 0.576
	#alpha_list should be a dense list of p-value cutoffs; 
	#here we only use a few values to ease computation of the example.

[Package HDDesign version 1.1 Index]