simple_domain_plot {viraldomain}R Documentation

Create a Simple Domain Plot

Description

This function generates a domain plot for a simple model based on PCA distances of the provided data.

Usage

simple_domain_plot(
  features,
  train_data,
  test_data,
  treshold_value,
  impute_hyperparameters
)

Arguments

features

A list of features according to their modeling roles. It should contain the following elements:

  • 'featured_col': Name of the featured column in the training data. When specifying the featured column, use "jitter_*" as a prefix to the featured variable of interest.

  • 'features_vl': Names of the columns containing viral load data (numeric values).

  • 'features_cd': Names of the columns containing CD4 data (numeric values).

train_data

The training data used to fit the MARS model.

test_data

The testing domain data used to calculate PCA distances.

treshold_value

The threshold for domain applicability scoring.

impute_hyperparameters

A list of parameters for imputation including 'indetect' (undetectable viral load level), 'tasa_exp' (exponential distribution rate of undetectable values), and 'semi' (set a seed for reproducibility).

Value

A domain plot showing PCA distances.

Examples

data(viral)
data(sero)
 # Adding "jitter_" prefix to original variable
features <- list(
  featured_col = "jittered_cd_2022",
  features_vl = "vl_2022",
  features_cd = "cd_2022"
  )
train_data = viral |>
dplyr::select("cd_2022", "vl_2022")
test_data = sero
treshold_value = 0.99
impute_hyperparameters = list(indetect = 40, tasa_exp = 1/13, semi = 123)
simple_domain_plot(features, train_data, test_data, treshold_value, impute_hyperparameters)

[Package viraldomain version 0.0.3 Index]