samplesize {planningML}R Documentation

Sample size determination

Description

This function determine the optimal sample size based on the performance evaluation metric and number of selected features.

Usage

samplesize(
  features = NULL,
  sample.size = seq(10, 1000, 20),
  method = "HCT",
  m = NULL,
  effectsize = NULL,
  class.prob = NULL,
  totalnum_features = NULL,
  threshold = 0.1,
  metric = "MCC",
  target = NULL
)

Arguments

features

feature selection results from the featureselection function in the package.

sample.size

sample size grid

method

default is HCT method, sample size dependent performance metric based on HCT method (HCT) or DS method (DS).

m

the number of features involved in the sample size determination. Default is NULL, which means the number of features are determined by the featureselection results based on the iHCT method. Otherwise, users can select the number based on their needs. The self-defined m should be smaller than the optimal number of features determined by the featureselection function.

effectsize

common effect size the the m features. NULL means the effect size is directly calculated from the data. Users can also provide the effect sizes based on historical data.

class.prob

probability of the event

totalnum_features

total number of features

threshold

default = 0.1. Threshold needed to determine the sample size.

metric

default = "MCC". The target performance estimation metric that you want to optimize. Other choices can be AUC.

target

target MCC/AUC that you want to achieve

Value

samplesize() returns sample size needed to achieve corresponding performance measurements.


[Package planningML version 1.0.1 Index]