mdgc {mdgc} | R Documentation |
Perform Model Estimation and Imputation
Description
A convenience function to perform model estimation and imputation in one
call. The learning rate is likely model specific and should be altered.
See mdgc_fit
.
See the README at https://github.com/boennecd/mdgc for examples.
Usage
mdgc(
dat,
lr = 0.001,
maxit = 25L,
batch_size = NULL,
rel_eps = 0.001,
method = c("svrg", "adam", "aug_Lagran"),
seed = 1L,
epsilon = 1e-08,
beta_1 = 0.9,
beta_2 = 0.999,
n_threads = 1L,
do_reorder = TRUE,
abs_eps = -1,
maxpts = 10000L,
minvls = 100L,
verbose = FALSE,
irel_eps = rel_eps,
imaxit = maxpts,
iabs_eps = abs_eps,
iminvls = 1000L,
start_val = NULL,
decay = 0.98,
conv_crit = 1e-05,
use_aprx = FALSE
)
Arguments
dat |
|
lr |
learning rate. |
maxit |
maximum number of iteration. |
batch_size |
number of observations in each batch. |
rel_eps |
relative error for each marginal likelihood factor. |
method |
estimation method to use. Can be |
seed |
fixed seed to use. Use |
epsilon |
ADAM parameters. |
beta_1 |
ADAM parameters. |
beta_2 |
ADAM parameters. |
n_threads |
number of threads to use. |
do_reorder |
logical for whether to use a heuristic variable
reordering. |
abs_eps |
absolute convergence threshold for each marginal likelihood factor. |
maxpts |
maximum number of samples to draw for each marginal likelihood term. |
minvls |
minimum number of samples. |
verbose |
logical for whether to print output during the estimation. |
irel_eps |
relative error for each term in the imputation. |
imaxit |
maximum number of samples to draw in the imputation. |
iabs_eps |
absolute convergence threshold for each term in the imputation. |
iminvls |
minimum number of samples in the imputation. |
start_val |
starting value for the covariance matrix. Use
|
decay |
the learning rate used by SVRG is given by |
conv_crit |
relative convergence threshold. |
use_aprx |
logical for whether to use an approximation of
|
Details
It is important that the input for data
has the appropriate types and
classes. See get_mdgc
.
Value
A list with the following entries:
ximp |
|
imputed |
output from |
vcov |
the estimated covariance matrix. |
mea |
the estimated non-zero mean terms. |
Additional elements may be present depending on the chosen method
.
See mdgc_fit
.
References
Kingma, D.P., & Ba, J. (2015). Adam: A Method for Stochastic Optimization. abs/1412.6980.
Johnson, R., & Zhang, T. (2013). Accelerating stochastic gradient descent using predictive variance reduction. In Advances in neural information processing systems.
See Also
get_mdgc
, mdgc_start_value
,
get_mdgc_log_ml
, mdgc_fit
,
mdgc_impute
Examples
# there is a bug on CRAN's check on Solaris which I have failed to reproduce.
# See https://github.com/r-hub/solarischeck/issues/8#issuecomment-796735501.
# Thus, this example is not run on Solaris
is_solaris <- tolower(Sys.info()[["sysname"]]) == "sunos"
if(!is_solaris && require(catdata)){
data(retinopathy)
# prepare data and save true data set
retinopathy$RET <- as.ordered(retinopathy$RET)
retinopathy$SM <- as.logical(retinopathy$SM)
# randomly mask data
set.seed(28325145)
truth <- retinopathy
for(i in seq_along(retinopathy))
retinopathy[[i]][runif(NROW(retinopathy)) < .3] <- NA
cat("\nMasked data:\n")
print(head(retinopathy, 10))
cat("\n")
# impute data
impu <- mdgc(retinopathy, lr = 1e-3, maxit = 25L, batch_size = 25L,
rel_eps = 1e-3, maxpts = 5000L, verbose = TRUE,
n_threads = 1L, method = "svrg")
# show correlation matrix
cat("\nEstimated correlation matrix\n")
print(impu$vcov)
# compare imputed and true values
cat("\nObserved;\n")
print(head(retinopathy, 10))
cat("\nImputed values:\n")
print(head(impu$ximp, 10))
cat("\nTruth:\n")
print(head(truth, 10))
# using augmented Lagrangian method
cat("\n")
impu_aug <- mdgc(retinopathy, maxit = 25L, rel_eps = 1e-3,
maxpts = 5000L, verbose = TRUE,
n_threads = 1L, method = "aug_Lagran")
# compare the log-likelihood estimate
obj <- get_mdgc_log_ml(retinopathy)
cat(sprintf(
"Maximum log likelihood with SVRG vs. augmented Lagrangian:\n %.2f vs. %.2f\n",
mdgc_log_ml(obj, vcov = impu $vcov, mea = impu $mea, rel_eps = 1e-3),
mdgc_log_ml(obj, vcov = impu_aug$vcov, mea = impu_aug$mea, rel_eps = 1e-3)))
# show correlation matrix
cat("\nEstimated correlation matrix (augmented Lagrangian)\n")
print(impu_aug$vcov)
cat("\nImputed values (augmented Lagrangian):\n")
print(head(impu_aug$ximp, 10))
}