| optimPibbleCollapsed {fido} | R Documentation |
Function to Optimize the Collapsed Pibble Model
Description
See details for model. Should likely be followed by function
uncollapsePibble. Notation: N is number of samples,
D is number of multinomial categories, and Q is number
of covariates.
Usage
optimPibbleCollapsed(
Y,
upsilon,
ThetaX,
KInv,
AInv,
init,
n_samples = 2000L,
calcGradHess = TRUE,
b1 = 0.9,
b2 = 0.99,
step_size = 0.003,
epsilon = 1e-06,
eps_f = 1e-10,
eps_g = 1e-04,
max_iter = 10000L,
verbose = FALSE,
verbose_rate = 10L,
decomp_method = "cholesky",
optim_method = "lbfgs",
eigvalthresh = 0,
jitter = 0,
multDirichletBoot = -1,
useSylv = TRUE,
ncores = -1L,
seed = -1L
)
Arguments
Y |
D x N matrix of counts |
upsilon |
(must be > D) |
ThetaX |
D-1 x N matrix formed by Theta*X (Theta is Prior mean for regression coefficients) |
KInv |
D-1 x D-1 precision matrix (inverse of Xi) |
AInv |
N x N precision matrix given by |
init |
D-1 x N matrix of initial guess for eta used for optimization |
n_samples |
number of samples for Laplace Approximation (=0 very fast as no inversion or decomposition of Hessian is required) |
calcGradHess |
if n_samples=0 should Gradient and Hessian still be calculated using closed form solutions? |
b1 |
(ADAM) 1st moment decay parameter (recommend 0.9) "aka momentum" |
b2 |
(ADAM) 2nd moment decay parameter (recommend 0.99 or 0.999) |
step_size |
(ADAM) step size for descent (recommend 0.001-0.003) |
epsilon |
(ADAM) parameter to avoid divide by zero |
eps_f |
(ADAM) normalized function improvement stopping criteria |
eps_g |
(ADAM) normalized gradient magnitude stopping criteria |
max_iter |
(ADAM) maximum number of iterations before stopping |
verbose |
(ADAM) if true will print stats for stopping criteria and iteration number |
verbose_rate |
(ADAM) rate to print verbose stats to screen |
decomp_method |
decomposition of hessian for Laplace approximation 'eigen' (more stable-slightly, slower) or 'cholesky' (less stable, faster, default) |
optim_method |
(default:"lbfgs") or "adam" |
eigvalthresh |
threshold for negative eigenvalues in decomposition of negative inverse hessian (should be <=0) |
jitter |
(default: 0) if >=0 then adds that factor to diagonal of Hessian before decomposition (to improve matrix conditioning) |
multDirichletBoot |
if >0 then it overrides laplace approximation and samples eta efficiently at MAP estimate from pseudo Multinomial-Dirichlet posterior. |
useSylv |
(default: true) if N<D-1 uses Sylvester Determinant Identity to speed up calculation of log-likelihood and gradients. |
ncores |
(default:-1) number of cores to use, if ncores==-1 then uses default from OpenMP typically to use all available cores. |
seed |
(random seed for Laplace approximation – integer) |
Details
Notation: Let Z_j denote the J-th row of a matrix Z.
Model:
Y_j \sim Multinomial(\pi_j)
\pi_j = \Phi^{-1}(\eta_j)
\eta \sim T_{D-1, N}(\upsilon, \Theta X, K, A)
Where A = I_N + X \Gamma X', K is a (D-1)x(D-1) covariance
matrix, \Gamma is a Q x Q covariance matrix, and \Phi^{-1} is ALRInv_D
transform.
Gradient and Hessian calculations are fast as they are computed using closed form solutions. That said, the Hessian matrix can be quite large [N*(D-1) x N*(D-1)] and storage may be an issue.
Note: Warnings about large negative eigenvalues can either signal
that the optimizer did not reach an optima or (more commonly in my experience)
that the prior / degrees of freedom for the covariance (given by parameters
upsilon and KInv) were too specific and at odds with the observed data.
If you get this warning try the following.
Try restarting the optimization using a different initial guess for eta
Try decreasing (or even increasing )
step_size(by increments of 0.001 or 0.002) and increasingmax_iterparameters in optimizer. Also can try increasingb1to 0.99 and decreasingeps_fby a few orders of magnitudeTry relaxing prior assumptions regarding covariance matrix. (e.g., may want to consider decreasing parameter
upsiloncloser to a minimum value of D)Try adding small amount of jitter (e.g., set
jitter=1e-5) to address potential floating point errors.
Value
List containing (all with respect to found optima)
LogLik - Log Likelihood of collapsed model (up to proportionality constant)
Gradient - (if
calcGradHess=true)Hessian - (if
calcGradHess=true) of the POSITIVE LOG POSTERIORPars - Parameter value of eta at optima
Samples - (D-1) x N x n_samples array containing posterior samples of eta based on Laplace approximation (if n_samples>0)
Timer - Vector of Execution Times
logInvNegHessDet - the log determinant of the covariacne of the Laplace approximation, useful for calculating marginal likelihood
logMarginalLikelihood - A calculation of the log marginal likelihood based on the laplace approximation
References
S. Ruder (2016) An overview of gradient descent optimization algorithms. arXiv 1609.04747
JD Silverman K Roche, ZC Holmes, LA David, S Mukherjee. Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes. 2022, Journal of Machine Learning
See Also
Examples
sim <- pibble_sim()
# Fit model for eta
fit <- optimPibbleCollapsed(sim$Y, sim$upsilon, sim$Theta%*%sim$X, sim$KInv,
sim$AInv, random_pibble_init(sim$Y))