smdi_diagnose {smdi}R Documentation

Computes three group missing data summary diagnostics

Description

This function bundles and calls all three group diagnostics and returns the most important summary metrics. For more information and details, please refer to the individual functions.

Important: don't include variables like ID variables, ZIP codes, dates, etc.

Usage

smdi_diagnose(
  data = NULL,
  covar = NULL,
  median = TRUE,
  includeNA = FALSE,
  train_test_ratio = c(0.7, 0.3),
  set_seed = 42,
  ntree = 1000,
  n_cores = 1,
  model = c("logistic", "linear", "cox"),
  form_lhs = NULL,
  exponentiated = FALSE
)

Arguments

data

dataframe or tibble object with partially observed/missing variables

covar

character covariate or covariate vector with partially observed variable/column name(s) to investigate. If NULL, the function automatically includes all columns with at least one missing observation and all remaining covariates will be used as predictors

median

logical if the median (= TRUE; recommended default) or mean of all absolute standardized mean differences (asmd) should be computed (smdi_asmd())

includeNA

logical, should missingness of other partially observed covariates be explicitly modeled for computation of absolute standardized mean differences (default is FALSE)

train_test_ratio

numeric vector to indicate the test/train split ratio for random forest missingness prediction model, e.g. c(.7, .3) is the default

set_seed

seed for reproducibility of random forest missingness prediction model, defaults to 42

ntree

integer, number of trees for random forest missingness prediction model (defaults to 1000 trees)

n_cores

integer, if >1, computations will be parallelized across amount of cores specified in n_cores (only UNIX systems)

model

character describing which outcome model to fit to assess the association between covar missingness indicator and outcome. Currently supported are models of type logistic, linear and cox (see smdi_outcome)

form_lhs

string specifying the left-hand side of the outcome formula (see smdi_outcome)

exponentiated

logical, should results of outcome regression to assess association between missingness and outcome be exponentiated (default is FALSE)

Details

Wrapper for individual diagnostics function.

Value

smdi object including a summary table of all three smdi group diagnostics:

Group 1 diagnostic:

Group 2 diagnostic:

Group 3 diagnostic:

References

TBD

See Also

smdi_asmd smdi_hotelling smdi_little smdi_rf smdi_outcome

Examples

library(smdi)

smdi_diagnose(
 data = smdi_data,
 covar = "egfr_cat",
 model = "cox",
 form_lhs = "Surv(eventtime, status)"
 )


[Package smdi version 0.2.2 Index]