ds_method {HDDesign}R Documentation

Estimate PCC by DS Method

Description

Determine the probability of correct classification (PCC) for studies employing high dimensional features for classification; uses the method proposed by (Dobbin and Simon 2007) to choose the p-value threshold for feature selection.

Usage

	ds_method(mu0, p, m, n, p1=0.5, lmax=1, ss=F, sampling.p)

Arguments

mu0

The effect size of the important features.

p

The number of the features in total.

m

The number of the important features.

n

The total sample size for the two groups.

p1

The prevalence of the group 1 in the population, default to 0.5.

lmax

The maximum eigenvalue of the variance-covariance matrix of the p features. Defaults to 1 which implies that the features are assumed i.i.d.

ss

Boolean variable, default to FALSE. The TRUE value instruct the program to compute the sensitivity and the specificity of the classifier.

sampling.p

The assumed proportion of group 1 samples in the training data; default of 0.5 assumes groups are equally represented regardless of p1.

Details

Refer to Dobbin and Simon (2007)

Value

If ss=FALSE, the function returns the expected PCC. If ss=TRUE, the function returns a vector containing the expected PCC, sensitivity and specificity.

Author(s)

Meihua Wu <meihuawu@umich.edu> Brisa N. Sanchez <brisa@umich.edu> Peter X.K. Song <pxsong@umich.edu> Raymond Luu <raluu@umich.edu> Wen Wang <wangwen@umich.edu>

References

Dobbin, K.K., and Simon R.M. (2007). "Sample Size Planning for Developing Classifiers Using High-dimensional DNA Microarray Data." Biostatistics 8 (1): 101-117.

Examples

ds_method(mu0=0.6, p=500, m=10, n=38, p1=0.5, lmax=1, ss=TRUE)
#[1] 0.9252471 0.9252471 0.9252471

[Package HDDesign version 1.1 Index]