R: Train Phenotyping Model using the Training Labels

phecap_train_phenotyping_model {PheCAP}

R Documentation

Train Phenotyping Model using the Training Labels

Description

Train the phenotyping model on the training dataset, and evaluate its performance via random splits of the training dataset.

Usage

phecap_train_phenotyping_model(
  data, surrogates, feature_selected,
  method = "lasso_bic",
  train_percent = 0.7, num_splits = 200L,
  start_seed = 78900L, verbose = 0L)

Arguments

`data`	an object of class `PhecapData`, obtained by calling `PhecapData(...)`.
`surrogates`	a list of objects of class `PhecapSurrogate`, obtained by something like `list(PhecapSurrogate(...), PhecapSurrogate(...))`. The surrogates used here might be different from that used in feature extraction.
`feature_selected`	a character vector of the features that should be included in the model, probably returned by `phecap_run_feature_extraction` (but not necessary). The features listed here might be different from those returned from feature extraction.
`method`	Either a character vector or a list of two components. If a character vector is used, possible entries are given below. When at least two methods are specified, the predicted probability is the simple average of the predicted probabilities from each method. `'plain'` (logistic regression without penalty) `'ridge_cv'` (logistic regression with ridge penalty and CV tuning) `'lasso_cv'` (logistic regression with lasso penalty and CV tuning) `'lasso_bic'` (logistic regression with lasso penalty and BIC tuning) `'alasso_cv'` (logistic regression with adaptive lasso penalty and CV tuning) `'alasso_bic'` (logistic regression with adaptive lasso penalty and BIC tuning) `'svm'` (support vector machine with CV tuning, package `e1071` needed, `subject_weight` not supported) `'rf'` (random forest with default parameters, package `randomForestSRC` needed) `'xgb'` (extreme gradient boosting with default parameters, package `xgboost` needed) If a list is used, it should contain two named components as follows. `fit` (a function for model fitting, with arguments `x`, `y`, `subject_weight`, `penalty_weight`) `predict` (a function for prediction, with arguments `object` which was returned by `fit`, `x` which was used as the new data to predict on)
`train_percent`	The percentage (between 0 and 1) of labels that are used for model training during random splits
`num_splits`	The number of random splits.
`start_seed`	in the i-th split, the seed is set to start_seed + i.
`verbose`	print progress every verbose splits if verbose is positive, or remain quiet if verbose is zero

Value

An object of class PhecapModel, with components

`coefficients`	the fitted object
`method`	the method used for model training
`feature_selected`	the feature selected by SAFE
`train_roc`	ROC on training dataset
`train_auc`	AUC on training dataset
`split_roc`	average ROC on random splits of training dataset
`split_auc`	average AUC on random splits of training dataset
`fit_function`	the function used for fitting
`predict_function`	the function used for prediction

Train Phenotyping Model using the Training Labels

Description

Usage

Arguments

Value

See Also