learningcurve_data {planningML}R Documentation

Generate descriptive summary for objects returned by functions in EHRsampling

Description

Generate descriptive summary for objects returned by functions in EHRsampling.

Usage

learningcurve_data(
  x,
  y,
  method = "log",
  metric = "MCC",
  batchsize = 60,
  class.prob,
  pct.train = 0.8,
  nfold = 5,
  nrepeat = 10
)

Arguments

x

a matrix of predictor variables

y

a vector of binary outcome, encoded as a factor and denoted by 1 for events and 0 for non-events

method

training method to get performance measurements. Available options are "log" (logistic regression, default), "regul.log" (regularized logistic regression), "svm" (support vector machine), "rf" (random forest) and "lda" (linear discriminant analysis)

metric

default = "MCC". The target performance estimation metric that you want to optimize. Other choice can be "AUC".

batchsize

sample size for each training batch

class.prob

probability of the event

pct.train

the percentage of data that goes to training. Default is 0.8

nfold

number of folds in cross validation

nrepeat

number of repeats for cross validation

Value

learningcurve_data() returns a data frame of sample size and the corresponding performance measurements.


[Package planningML version 1.0.1 Index]