mdgc_impute {mdgc}R Documentation

Impute Missing Values

Description

Imputes missing values given a covariance matrix and mean vector using a similar quasi-random numbers method as mdgc_log_ml.

Usage

mdgc_impute(
  object,
  vcov,
  mea,
  rel_eps = 0.001,
  maxit = 10000L,
  abs_eps = -1,
  n_threads = 1L,
  do_reorder = TRUE,
  minvls = 1000L,
  use_aprx = FALSE
)

Arguments

object

returned object from get_mdgc.

vcov

covariance matrix to condition on in the imputation.

mea

vector with non-zero mean entries to condition on.

rel_eps

relative convergence threshold for each term in the approximation.

maxit

maximum number of samples

abs_eps

absolute convergence threshold for each term in the approximation.

n_threads

number of threads to use.

do_reorder

logical for whether to use a heuristic variable reordering. TRUE is likely the best option.

minvls

minimum number of samples.

use_aprx

logical for whether to use an approximation of pnorm and qnorm. This may yield a noticeable reduction in the computation time.

Value

A list of lists with imputed values for the continuous variables and a vector with probabilities for each level for the ordinal, binary, and multinomial variables.

Examples


# there is a bug on CRAN's check on Solaris which I have failed to reproduce.
# See https://github.com/r-hub/solarischeck/issues/8#issuecomment-796735501.
# Thus, this example is not run on Solaris
is_solaris <- tolower(Sys.info()[["sysname"]]) == "sunos"

if(!is_solaris){
  # randomly mask data
  set.seed(11)
  masked_data <- iris
  masked_data[matrix(runif(prod(dim(iris))) < .10, NROW(iris))] <- NA

  # use the functions in the package
  library(mdgc)
  obj <- get_mdgc(masked_data)
  ptr <- get_mdgc_log_ml(obj)
  start_vals <- mdgc_start_value(obj)

  fit <- mdgc_fit(ptr, start_vals, obj$means, rel_eps = 1e-2, maxpts = 10000L,
                  minvls = 1000L, use_aprx = TRUE, batch_size = 100L, lr = .001,
                  maxit = 100L, n_threads = 2L)

  # impute using the estimated values
  imputed <- mdgc_impute(obj, fit$result$vcov, fit$result$mea, minvls = 1000L,
                       maxit = 10000L, n_threads = 2L, use_aprx = TRUE)
  print(imputed[1:5]) # first 5 observations
  print(head(masked_data, 5)) # observed
  print(head(iris       , 5)) # truth
}



[Package mdgc version 0.1.7 Index]