optimizer_cov: string
(default = "lbfgs").
Optimizer used for estimating covariance parameters.
Options: "gradient_descent", "lbfgs", "fisher_scoring", "newton", "nelder_mead", "adam".
If there are additional auxiliary parameters for non-Gaussian likelihoods,
'optimizer_cov' is also used for those
optimizer_coef: string
(default = "wls" for Gaussian likelihoods and "lbfgs" for other likelihoods).
Optimizer used for estimating linear regression coefficients, if there are any
(for the GPBoost algorithm there are usually none).
Options: "gradient_descent", "lbfgs", "wls", "nelder_mead", "adam". Gradient descent steps are done simultaneously
with gradient descent steps for the covariance parameters.
"wls" refers to doing coordinate descent for the regression coefficients using weighted least squares.
If 'optimizer_cov' is set to "nelder_mead", "lbfgs", or "adam",
'optimizer_coef' is automatically also set to the same value.
maxit: integer
(default = 1000).
Maximal number of iterations for optimization algorithm
delta_rel_conv: numeric
(default = 1E-6 except for "nelder_mead" for which the default is 1E-8).
Convergence tolerance. The algorithm stops if the relative change
in either the (approximate) log-likelihood or the parameters is below this value.
For "adam", the L2 norm of the gradient is used instead of the relative change in the log-likelihood.
If < 0, internal default values are used
convergence_criterion: string
(default = "relative_change_in_log_likelihood").
The convergence criterion used for terminating the optimization algorithm.
Options: "relative_change_in_log_likelihood" or "relative_change_in_parameters"
init_coef: vector
with numeric
elements (default = NULL).
Initial values for the regression coefficients (if there are any, can be NULL)
init_cov_pars: vector
with numeric
elements (default = NULL).
Initial values for covariance parameters of Gaussian process and
random effects (can be NULL). The order it the same as the order
of the parameters in the summary function: first is the error variance
(only for "gaussian" likelihood), next follow the variances of the
grouped random effects (if there are any, in the order provided in 'group_data'),
and then follow the marginal variance and the range of the Gaussian process.
If there are multiple Gaussian processes, then the variances and ranges follow alternatingly.
If 'init_cov_pars = NULL', an internal choice is used that depends on the
likelihood and the random effects type and covariance function.
If you select the option 'trace = TRUE' in the 'params' argument,
you will see the first initial covariance parameters in iteration 0.
lr_coef: numeric
(default = 0.1).
Learning rate for fixed effect regression coefficients if gradient descent is used
lr_cov: numeric
(default = 0.1 for "gradient_descent" and 1. otherwise).
Initial learning rate for covariance parameters if a gradient-based optimization method is used
If lr_cov < 0, internal default values are used (0.1 for "gradient_descent" and 1. otherwise)
If there are additional auxiliary parameters for non-Gaussian likelihoods,
'lr_cov' is also used for those
For "lbfgs", this is divided by the norm of the gradient in the first iteration
use_nesterov_acc: boolean
(default = TRUE).
If TRUE Nesterov acceleration is used.
This is used only for gradient descent
acc_rate_coef: numeric
(default = 0.5).
Acceleration rate for regression coefficients (if there are any)
for Nesterov acceleration
acc_rate_cov: numeric
(default = 0.5).
Acceleration rate for covariance parameters for Nesterov acceleration
momentum_offset: integer
(Default = 2).
Number of iterations for which no momentum is applied in the beginning.
trace: boolean
(default = FALSE).
If TRUE, information on the progress of the parameter
optimization is printed
std_dev: boolean
(default = TRUE).
If TRUE, approximate standard deviations are calculated for the covariance and linear regression parameters
(= square root of diagonal of the inverse Fisher information for Gaussian likelihoods and
square root of diagonal of a numerically approximated inverse Hessian for non-Gaussian likelihoods)
init_aux_pars: vector
with numeric
elements (default = NULL).
Initial values for additional parameters for non-Gaussian likelihoods
(e.g., shape parameter of a gamma or negative_binomial likelihood)
estimate_aux_pars: boolean
(default = TRUE).
If TRUE, additional parameters for non-Gaussian likelihoods
are also estimated (e.g., shape parameter of a gamma or negative_binomial likelihood)
cg_max_num_it: integer
(default = 1000).
Maximal number of iterations for conjugate gradient algorithms
cg_max_num_it_tridiag: integer
(default = 1000).
Maximal number of iterations for conjugate gradient algorithm
when being run as Lanczos algorithm for tridiagonalization
cg_delta_conv: numeric
(default = 1E-2).
Tolerance level for L2 norm of residuals for checking convergence
in conjugate gradient algorithm when being used for parameter estimation
num_rand_vec_trace: integer
(default = 50).
Number of random vectors (e.g., Rademacher) for stochastic approximation of the trace of a matrix
reuse_rand_vec_trace: boolean
(default = TRUE).
If true, random vectors (e.g., Rademacher) for stochastic approximations
of the trace of a matrix are sampled only once at the beginning of
the parameter estimation and reused in later trace approximations.
Otherwise they are sampled every time a trace is calculated
seed_rand_vec_trace: integer
(default = 1).
Seed number to generate random vectors (e.g., Rademacher)
piv_chol_rank: integer
(default = 50).
Rank of the pivoted Cholesky decomposition used as
preconditioner in conjugate gradient algorithms
cg_preconditioner_type: string
.
Type of preconditioner used for conjugate gradient algorithms.
Options for non-Gaussian likelihoods and gp_approx = "vecchia":
"piv_chol_on_Sigma": (Lk * Lk^T + W^-1) as preconditioner for inverting (B^-1 * D * B^-T + W^-1),
where Lk is a low-rank pivoted Cholesky approximation for Sigma and B^-1 * D * B^-T approx= Sigma
Options for likelihood = "gaussian" and gp_approx = "full_scale_tapering":