R: Optimize the Log Marginal Likelihood Using a Stochastic...

pedmod_sqn {pedmod}

R Documentation

Optimize the Log Marginal Likelihood Using a Stochastic Quasi-Newton Method

Description

Optimizes eval_pedigree_ll and eval_pedigree_grad using a stochastic quasi-Newton method.

Usage

pedmod_sqn(
  ptr,
  par,
  maxvls,
  abs_eps,
  rel_eps,
  step_factor,
  n_it,
  n_grad_steps,
  indices = NULL,
  minvls = -1L,
  n_grad = 50L,
  n_hess = 500L,
  do_reorder = TRUE,
  use_aprx = FALSE,
  n_threads = 1L,
  cluster_weights = NULL,
  fix = NULL,
  standardized = FALSE,
  minvls_hess = minvls,
  maxvls_hess = maxvls,
  abs_eps_hess = abs_eps,
  rel_eps_hess = rel_eps,
  verbose = FALSE,
  method = 0L,
  check_every = 2L * n_grad_steps,
  use_tilting = FALSE,
  vls_scales = NULL
)

Arguments

`ptr`	object from `pedigree_ll_terms`.
`par`	starting values.
`maxvls`	maximum number of samples in the approximation for each marginal likelihood term.
`abs_eps`	absolute convergence threshold for `eval_pedigree_ll` and `eval_pedigree_grad`.
`rel_eps`	rel_eps convergence threshold for `eval_pedigree_ll` and `eval_pedigree_grad`.
`step_factor`	factor used for the step size. The step size is `step_factor` divided by the iteration number.
`n_it`	number of stochastic gradient steps to make.
`n_grad_steps`	number of stochastic gradient steps to make between each Hessian approximation update.
`indices`	zero-based vector with indices of which log marginal likelihood terms to include. Use `NULL` if all indices should be used.
`minvls`	minimum number of samples for each marginal likelihood term. Negative values provides a default which depends on the dimension of the integration.
`n_grad`	number of log marginal likelihood terms to include in the stochastic gradient step.
`n_hess`	number of log marginal likelihood terms to include in the gradients used for the Hessian approximation update. This is set to the entire sample (or `indices`) if this is greater than or equal to half the number of log marginal likelihood terms.
`do_reorder`	`TRUE` if a heuristic variable reordering should be used. `TRUE` is likely the best value.
`use_aprx`	`TRUE` if a less precise approximation of `pnorm` and `qnorm` should be used. This may reduce the computation time while not affecting the result much.
`n_threads`	number of threads to use.
`cluster_weights`	numeric vector with weights for each cluster. Use `NULL` if all clusters have weight one.
`fix`	integer vector with indices of `par` to fix. This is useful for computing profile likelihoods. `NULL` yields all parameters.
`standardized`	logical for whether to use the standardized or direct parameterization. See `standardized_to_direct` and the vignette at `vignette("pedmod", package = "pedmod")`.
`minvls_hess`	`minvls` argument to use when updating the Hessian approximation.
`maxvls_hess`	`maxvls` argument to use when updating the Hessian approximation.
`abs_eps_hess`	`abs_eps` argument to use when updating the Hessian approximation.
`rel_eps_hess`	`rel_eps` argument to use when updating the Hessian approximation.
`verbose`	logical for whether to print output during the estimation.
`method`	integer with the method to use. Zero yields randomized Korobov lattice rules while one yields scrambled Sobol sequences.
`check_every`	integer for the number of gradient steps between checking that the likelihood did increase. If not, the iterations are reset and the step-size is halved.
`use_tilting`	`TRUE` if the minimax tilting method suggested by Botev (2017) should be used. See doi:10.1111/rssb.12162.
`vls_scales`	can be a numeric vector with a positive scalar for each cluster. Then `vls_scales[i] * minvls` and `vls_scales[i] * maxvls` is used for cluster `i` rather than `minvls` and `maxvls`. Set `vls_scales = NULL` if the latter should be used.

Details

The function uses a stochastic quasi-Newton method like suggested by Byrd et al. (2016) with a few differences: Differences in gradients are used rather than Hessian-vector products, BFGS rather than L-BFGS is used because the problem is typically low dimensional, and damped BFGS updates are used (see e.g. chapter 18 of Nocedal and Wright, 2006).

Separate arguments for the gradient approximation in the Hessian update are provided as one may want a more precise approximation for these gradients. step_factor likely depends on the other parameters and the data set and should be altered.

Value

A list with the following elements:

`par`	estimated parameters.
`omegas`	parameter estimates after each iteration.
`H`	Hessian approximation in the quasi-Newton method. It should not be treated as the Hessian.

References

Byrd, R. H., Hansen, S. L., Nocedal, J., & Singer, Y. (2016). A stochastic quasi-Newton method for large-scale optimization. SIAM Journal on Optimization, 26(2), 1008-1031.

Nocedal, J., & Wright, S. (2006). Numerical optimization. Springer Science & Business Media.

Examples


# we simulate outcomes with an additive genetic effect. The kinship matrix is
# the same for all families and given by
K <- matrix(c(
  0.5  , 0    , 0.25 , 0   , 0.25 , 0   , 0.125 , 0.125 , 0.125 , 0.125 ,
  0    , 0.5  , 0.25 , 0   , 0.25 , 0   , 0.125 , 0.125 , 0.125 , 0.125 ,
  0.25 , 0.25 , 0.5  , 0   , 0.25 , 0   , 0.25  , 0.25  , 0.125 , 0.125 ,
  0    , 0    , 0    , 0.5 , 0    , 0   , 0.25  , 0.25  , 0     , 0     ,
  0.25 , 0.25 , 0.25 , 0   , 0.5  , 0   , 0.125 , 0.125 , 0.25  , 0.25  ,
  0    , 0    , 0    , 0   , 0    , 0.5 , 0     , 0     , 0.25  , 0.25  ,
  0.125, 0.125, 0.25 , 0.25, 0.125, 0   , 0.5   , 0.25  , 0.0625, 0.0625,
  0.125, 0.125, 0.25 , 0.25, 0.125, 0   , 0.25  , 0.5   , 0.0625, 0.0625,
  0.125, 0.125, 0.125, 0   , 0.25 , 0.25, 0.0625, 0.0625, 0.5   , 0.25  ,
  0.125, 0.125, 0.125, 0   , 0.25 , 0.25, 0.0625, 0.0625, 0.25  , 0.5
), 10)

# simulates a data set.
#
# Args:
#   n_fams: number of families.
#   beta: the fixed effect coefficients.
#   sig_sq: the scale parameter.
sim_dat <- function(n_fams, beta = c(-1, 1, 2), sig_sq = 3){
  # setup before the simulations
  Cmat <- 2 * K
  n_obs <- NROW(K)
  Sig <- diag(n_obs) + sig_sq * Cmat
  Sig_chol <- chol(Sig)

  # simulate the data
  out <- replicate(
    n_fams, {
      # simulate covariates
      X <- cbind(`(Intercept)` = 1, Continuous = rnorm(n_obs),
                 Binary = runif(n_obs) > .5)

      # assign the linear predictor + noise
      eta <- drop(X %*% beta) + drop(rnorm(n_obs) %*% Sig_chol)

      # return the list in the format needed for the package
      list(y = as.numeric(eta > 0), X = X, scale_mats = list(Cmat))
    }, simplify = FALSE)

  # add attributes with the true values and return
  attributes(out) <- list(beta = beta, sig_sq = sig_sq)
  out
}

# simulate the data
set.seed(1)
dat <- sim_dat(100L)

# fit the model
ptr <- pedigree_ll_terms(dat, max_threads = 1L)
start <- pedmod_start(ptr = ptr, data = dat, n_threads = 1L)
fit <- pedmod_sqn(ptr = ptr, par = start$par, n_threads = 1L, use_aprx = TRUE,
                  maxvls = 5000L, minvls = 1000L, abs_eps = 0, rel_eps = 1e-3,
                  n_grad_steps = 20L, step_factor = 1, n_grad = 10L,
                  n_hess = 50L, check_every = 50L, n_it = 1000L)
fit$par # maximum likelihood estimate
# the maximum likelihood
eval_pedigree_ll(ptr = ptr, fit$par, maxvls = 5000L, abs_eps = 0,
                 rel_eps = 1e-3, minvls = 1000L)

[Package pedmod version 0.2.4 Index]