R: Optimize the Log Marginal Likelihood

pedmod_opt {pedmod}

R Documentation

Optimize the Log Marginal Likelihood

Description

Optimizes eval_pedigree_ll and eval_pedigree_grad using a passed optimization function.

Usage

pedmod_opt(
  ptr,
  par,
  maxvls,
  abs_eps,
  rel_eps,
  opt_func = NULL,
  seed = 1L,
  indices = NULL,
  minvls = -1L,
  do_reorder = TRUE,
  use_aprx = FALSE,
  n_threads = 1L,
  cluster_weights = NULL,
  fix = NULL,
  standardized = FALSE,
  method = 0L,
  use_tilting = FALSE,
  vls_scales = NULL,
  ...
)

pedmod_start(
  ptr,
  data,
  maxvls = 1000L,
  abs_eps = 0,
  rel_eps = 0.01,
  seed = 1L,
  indices = NULL,
  scale_max = 9,
  minvls = 100L,
  do_reorder = TRUE,
  use_aprx = TRUE,
  n_threads = 1L,
  cluster_weights = NULL,
  standardized = FALSE,
  method = 0L,
  sc_start = NULL,
  use_tilting = FALSE,
  vls_scales = NULL
)

pedmod_start_loadings(
  ptr,
  data,
  indices = NULL,
  cluster_weights = NULL,
  sc_start_invariant = NULL
)

Arguments

`ptr`	object from `pedigree_ll_terms` or `pedigree_ll_terms_loadings`.
`par`	starting values passed to `opt_func`.
`maxvls`	maximum number of samples in the approximation for each marginal likelihood term.
`abs_eps`	absolute convergence threshold for `eval_pedigree_ll` and `eval_pedigree_grad`.
`rel_eps`	rel_eps convergence threshold for `eval_pedigree_ll` and `eval_pedigree_grad`.
`opt_func`	function to perform minimization with arguments like `optim`. BFGS is used with `optim` if this argument is `NULL`.
`seed`	seed to pass to `set.seed` before each gradient and function evaluation. Use `NULL` if the seed should not be fixed.
`indices`	zero-based vector with indices of which log marginal likelihood terms to include. Use `NULL` if all indices should be used.
`minvls`	minimum number of samples for each marginal likelihood term. Negative values provides a default which depends on the dimension of the integration.
`do_reorder`	`TRUE` if a heuristic variable reordering should be used. `TRUE` is likely the best value.
`use_aprx`	`TRUE` if a less precise approximation of `pnorm` and `qnorm` should be used. This may reduce the computation time while not affecting the result much.
`n_threads`	number of threads to use.
`cluster_weights`	numeric vector with weights for each cluster. Use `NULL` if all clusters have weight one.
`fix`	integer vector with indices of `par` to fix. This is useful for computing profile likelihoods. `NULL` yields all parameters.
`standardized`	logical for whether to use the standardized or direct parameterization. See `standardized_to_direct` and the vignette at `vignette("pedmod", package = "pedmod")`.
`method`	integer with the method to use. Zero yields randomized Korobov lattice rules while one yields scrambled Sobol sequences.
`use_tilting`	`TRUE` if the minimax tilting method suggested by Botev (2017) should be used. See doi:10.1111/rssb.12162.
`vls_scales`	can be a numeric vector with a positive scalar for each cluster. Then `vls_scales[i] * minvls` and `vls_scales[i] * maxvls` is used for cluster `i` rather than `minvls` and `maxvls`. Set `vls_scales = NULL` if the latter should be used.
`...`	Arguments passed to `opt_func`.
`data`	the `list` that was passed to `pedigree_ll_terms` or `pedigree_ll_terms_loadings`.
`scale_max`	the maximum value for the scale parameters. Sometimes, the optimization method tends to find large scale parameters and get stuck. Setting a maximum solves this.
`sc_start`	starting value for the scale parameters. Use `NULL` if you have no value to start with.
`sc_start_invariant`	scale parameter(s) like sc_start. It is the value that all individuals should have (i.e. not one that varies by individual).

Details

pedmod_start and pedmod_start_loadings yield starting values which can be used for pedmod_opt. The methods are based on a heuristics.

Value

pedmod_opt: The output from the opt_func argument. Thus, if fix is supplied then this is optimal values of only par[-fix] with par[fix] being fixed to the inputs. Thus, the length is only the number of non-fixed parameters.

pedmod_start: A list with:

par: the starting value.
beta_no_rng: the fixed effects MLEs without random effects.
logLik_no_rng: the log maximum likelihood without random effects.
logLik_est: the likelihood at par.

pedmod_start_loadings: A list with:

par: the starting value.
beta_no_rng: the fixed effects MLEs without random effects.
logLik_no_rng: the log maximum likelihood without random effects.

Examples


# we simulate outcomes with an additive genetic effect. The kinship matrix is
# the same for all families and given by
K <- matrix(c(
  0.5  , 0    , 0.25 , 0   , 0.25 , 0   , 0.125 , 0.125 , 0.125 , 0.125 ,
  0    , 0.5  , 0.25 , 0   , 0.25 , 0   , 0.125 , 0.125 , 0.125 , 0.125 ,
  0.25 , 0.25 , 0.5  , 0   , 0.25 , 0   , 0.25  , 0.25  , 0.125 , 0.125 ,
  0    , 0    , 0    , 0.5 , 0    , 0   , 0.25  , 0.25  , 0     , 0     ,
  0.25 , 0.25 , 0.25 , 0   , 0.5  , 0   , 0.125 , 0.125 , 0.25  , 0.25  ,
  0    , 0    , 0    , 0   , 0    , 0.5 , 0     , 0     , 0.25  , 0.25  ,
  0.125, 0.125, 0.25 , 0.25, 0.125, 0   , 0.5   , 0.25  , 0.0625, 0.0625,
  0.125, 0.125, 0.25 , 0.25, 0.125, 0   , 0.25  , 0.5   , 0.0625, 0.0625,
  0.125, 0.125, 0.125, 0   , 0.25 , 0.25, 0.0625, 0.0625, 0.5   , 0.25  ,
  0.125, 0.125, 0.125, 0   , 0.25 , 0.25, 0.0625, 0.0625, 0.25  , 0.5
), 10)

# simulates a data set.
#
# Args:
#   n_fams: number of families.
#   beta: the fixed effect coefficients.
#   sig_sq: the scale parameter.
sim_dat <- function(n_fams, beta = c(-1, 1, 2), sig_sq = 3){
  # setup before the simulations
  Cmat <- 2 * K
  n_obs <- NROW(K)
  Sig <- diag(n_obs) + sig_sq * Cmat
  Sig_chol <- chol(Sig)

  # simulate the data
  out <- replicate(
    n_fams, {
      # simulate covariates
      X <- cbind(`(Intercept)` = 1, Continuous = rnorm(n_obs),
                 Binary = runif(n_obs) > .5)

      # assign the linear predictor + noise
      eta <- drop(X %*% beta) + drop(rnorm(n_obs) %*% Sig_chol)

      # return the list in the format needed for the package
      list(y = as.numeric(eta > 0), X = X, scale_mats = list(Cmat))
    }, simplify = FALSE)

  # add attributes with the true values and return
  attributes(out) <- list(beta = beta, sig_sq = sig_sq)
  out
}

# simulate the data
set.seed(1)
dat <- sim_dat(100L)

# fit the model
ptr <- pedigree_ll_terms(dat, max_threads = 1L)
start <- pedmod_start(ptr = ptr, data = dat, n_threads = 1L)
fit <- pedmod_opt(ptr = ptr, par = start$par, n_threads = 1L, use_aprx = TRUE,
                  maxvls = 5000L, minvls = 1000L, abs_eps = 0, rel_eps = 1e-3)
fit$par # the estimate
-fit$value # the log maximum likelihood
start$logLik_no_rng # the log maximum likelihood without the random effects

[Package pedmod version 0.2.4 Index]