likelihood |
A string specifying the likelihood function (distribution) of the response variable.
Available options:
"gaussian"
"bernoulli_probit": binary data with Bernoulli likelihood and a probit link function
"bernoulli_logit": binary data with Bernoulli likelihood and a logit link function
"gamma": gamma distribution with a with log link function
"poisson": Poisson distribution with a with log link function
"negative_binomial": negative binomial distribution with a with log link function
Note: other likelihoods could be implemented upon request
|
group_data |
A vector or matrix whose columns are categorical grouping variables.
The elements being group levels defining grouped random effects.
The elements of 'group_data' can be integer, double, or character.
The number of columns corresponds to the number of grouped (intercept) random effects
|
group_rand_coef_data |
A vector or matrix with numeric covariate data
for grouped random coefficients
|
ind_effect_group_rand_coef |
A vector with integer indices that
indicate the corresponding categorical grouping variable (=columns) in 'group_data' for
every covariate in 'group_rand_coef_data'. Counting starts at 1.
The length of this index vector must equal the number of covariates in 'group_rand_coef_data'.
For instance, c(1,1,2) means that the first two covariates (=first two columns) in 'group_rand_coef_data'
have random coefficients corresponding to the first categorical grouping variable (=first column) in 'group_data',
and the third covariate (=third column) in 'group_rand_coef_data' has a random coefficient
corresponding to the second grouping variable (=second column) in 'group_data'
|
drop_intercept_group_rand_effect |
A vector of type logical (boolean).
Indicates whether intercept random effects are dropped (only for random coefficients).
If drop_intercept_group_rand_effect[k] is TRUE, the intercept random effect number k is dropped / not included.
Only random effects with random slopes can be dropped.
|
gp_coords |
A matrix with numeric coordinates (= inputs / features) for defining Gaussian processes
|
gp_rand_coef_data |
A vector or matrix with numeric covariate data for
Gaussian process random coefficients
|
cov_function |
A string specifying the covariance function for the Gaussian process.
Available options:
"exponential": Exponential covariance function (using the parametrization of Diggle and Ribeiro, 2007)
"gaussian": Gaussian, aka squared exponential, covariance function (using the parametrization of Diggle and Ribeiro, 2007)
"matern": Matern covariance function with the smoothness specified by
the cov_fct_shape parameter (using the parametrization of Rasmussen and Williams, 2006)
"powered_exponential": powered exponential covariance function with the exponent specified by
the cov_fct_shape parameter (using the parametrization of Diggle and Ribeiro, 2007)
"wendland": Compactly supported Wendland covariance function (using the parametrization of Bevilacqua et al., 2019, AOS)
"matern_space_time": Spatio-temporal Matern covariance function with different range parameters for space and time.
Note that the first column in gp_coords must correspond to the time dimension
"matern_ard": anisotropic Matern covariance function with Automatic Relevance Determination (ARD),
i.e., with a different range parameter for every coordinate dimension / column of gp_coords
"gaussian_ard": anisotropic Gaussian, aka squared exponential, covariance function with Automatic Relevance Determination (ARD),
i.e., with a different range parameter for every coordinate dimension / column of gp_coords
|
cov_fct_shape |
A numeric specifying the shape parameter of the covariance function
(=smoothness parameter for Matern covariance)
This parameter is irrelevant for some covariance functions such as the exponential or Gaussian
|
gp_approx |
A string specifying the large data approximation
for Gaussian processes. Available options:
"none": No approximation
"vecchia": A Vecchia approximation; see Sigrist (2022, JMLR) for more details
"tapering": The covariance function is multiplied by
a compactly supported Wendland correlation function
"fitc": Fully Independent Training Conditional approximation aka
modified predictive process approximation; see Gyger, Furrer, and Sigrist (2024) for more details
"full_scale_tapering": A full scale approximation combining an
inducing point / predictive process approximation with tapering on the residual process;
see Gyger, Furrer, and Sigrist (2024) for more details
|
cov_fct_taper_range |
A numeric specifying the range parameter
of the Wendland covariance function and Wendland correlation taper function.
We follow the notation of Bevilacqua et al. (2019, AOS)
|
cov_fct_taper_shape |
A numeric specifying the shape (=smoothness) parameter
of the Wendland covariance function and Wendland correlation taper function.
We follow the notation of Bevilacqua et al. (2019, AOS)
|
num_neighbors |
An integer specifying the number of neighbors for
the Vecchia approximation. Note: for prediction, the number of neighbors can
be set through the 'num_neighbors_pred' parameter in the 'set_prediction_data'
function. By default, num_neighbors_pred = 2 * num_neighbors. Further,
the type of Vecchia approximation used for making predictions is set through
the 'vecchia_pred_type' parameter in the 'set_prediction_data' function
|
vecchia_ordering |
A string specifying the ordering used in
the Vecchia approximation. Available options:
"none": the default ordering in the data is used
"random": a random ordering
"time": ordering accorrding to time (only for space-time models)
"time_random_space": ordering according to time and randomly for all
spatial points with the same time points (only for space-time models)
|
ind_points_selection |
A string specifying the method for choosing inducing points
Available options:
"kmeans++: the k-means++ algorithm
"cover_tree": the cover tree algorithm
"random": random selection from data points
|
num_ind_points |
An integer specifying the number of inducing
points / knots for, e.g., a predictive process approximation
|
cover_tree_radius |
A numeric specifying the radius (= "spatial resolution")
for the cover tree algorithm
|
matrix_inversion_method |
A string specifying the method used for inverting covariance matrices.
Available options:
"cholesky": Cholesky factorization
"iterative": iterative methods. A combination of conjugate gradient, Lanczos algorithm, and other methods.
This is currently only supported for the following cases:
likelihood != "gaussian" and gp_approx == "vecchia" (non-Gaussian likelihoods with a Vecchia-Laplace approximation)
likelihood == "gaussian" and gp_approx == "full_scale_tapering" (Gaussian likelihood with a full-scale tapering approximation)
|
seed |
An integer specifying the seed used for model creation
(e.g., random ordering in Vecchia approximation)
|
cluster_ids |
A vector with elements indicating independent realizations of
random effects / Gaussian processes (same values = same process realization).
The elements of 'cluster_ids' can be integer, double, or character.
|
free_raw_data |
A boolean . If TRUE, the data (groups, coordinates, covariate data for random coefficients)
is freed in R after initialization
|
y |
A vector with response variable data
|
X |
A matrix with numeric covariate data for the
fixed effects linear regression term (if there is one)
|
params |
A list with parameters for the estimation / optimization
optimizer_cov: string (default = "lbfgs").
Optimizer used for estimating covariance parameters.
Options: "gradient_descent", "lbfgs", "fisher_scoring", "newton", "nelder_mead", "adam".
If there are additional auxiliary parameters for non-Gaussian likelihoods,
'optimizer_cov' is also used for those
optimizer_coef: string (default = "wls" for Gaussian likelihoods and "lbfgs" for other likelihoods).
Optimizer used for estimating linear regression coefficients, if there are any
(for the GPBoost algorithm there are usually none).
Options: "gradient_descent", "lbfgs", "wls", "nelder_mead", "adam". Gradient descent steps are done simultaneously
with gradient descent steps for the covariance parameters.
"wls" refers to doing coordinate descent for the regression coefficients using weighted least squares.
If 'optimizer_cov' is set to "nelder_mead", "lbfgs", or "adam",
'optimizer_coef' is automatically also set to the same value.
maxit: integer (default = 1000).
Maximal number of iterations for optimization algorithm
delta_rel_conv: numeric (default = 1E-6 except for "nelder_mead" for which the default is 1E-8).
Convergence tolerance. The algorithm stops if the relative change
in either the (approximate) log-likelihood or the parameters is below this value.
For "adam", the L2 norm of the gradient is used instead of the relative change in the log-likelihood.
If < 0, internal default values are used
convergence_criterion: string (default = "relative_change_in_log_likelihood").
The convergence criterion used for terminating the optimization algorithm.
Options: "relative_change_in_log_likelihood" or "relative_change_in_parameters"
init_coef: vector with numeric elements (default = NULL).
Initial values for the regression coefficients (if there are any, can be NULL)
init_cov_pars: vector with numeric elements (default = NULL).
Initial values for covariance parameters of Gaussian process and
random effects (can be NULL). The order it the same as the order
of the parameters in the summary function: first is the error variance
(only for "gaussian" likelihood), next follow the variances of the
grouped random effects (if there are any, in the order provided in 'group_data'),
and then follow the marginal variance and the range of the Gaussian process.
If there are multiple Gaussian processes, then the variances and ranges follow alternatingly.
If 'init_cov_pars = NULL', an internal choice is used that depends on the
likelihood and the random effects type and covariance function.
If you select the option 'trace = TRUE' in the 'params' argument,
you will see the first initial covariance parameters in iteration 0.
lr_coef: numeric (default = 0.1).
Learning rate for fixed effect regression coefficients if gradient descent is used
lr_cov: numeric (default = 0.1 for "gradient_descent" and 1. otherwise).
Initial learning rate for covariance parameters if a gradient-based optimization method is used
If lr_cov < 0, internal default values are used (0.1 for "gradient_descent" and 1. otherwise)
If there are additional auxiliary parameters for non-Gaussian likelihoods,
'lr_cov' is also used for those
For "lbfgs", this is divided by the norm of the gradient in the first iteration
use_nesterov_acc: boolean (default = TRUE).
If TRUE Nesterov acceleration is used.
This is used only for gradient descent
acc_rate_coef: numeric (default = 0.5).
Acceleration rate for regression coefficients (if there are any)
for Nesterov acceleration
acc_rate_cov: numeric (default = 0.5).
Acceleration rate for covariance parameters for Nesterov acceleration
momentum_offset: integer (Default = 2).
Number of iterations for which no momentum is applied in the beginning.
trace: boolean (default = FALSE).
If TRUE, information on the progress of the parameter
optimization is printed
std_dev: boolean (default = TRUE).
If TRUE, approximate standard deviations are calculated for the covariance and linear regression parameters
(= square root of diagonal of the inverse Fisher information for Gaussian likelihoods and
square root of diagonal of a numerically approximated inverse Hessian for non-Gaussian likelihoods)
init_aux_pars: vector with numeric elements (default = NULL).
Initial values for additional parameters for non-Gaussian likelihoods
(e.g., shape parameter of a gamma or negative_binomial likelihood)
estimate_aux_pars: boolean (default = TRUE).
If TRUE, additional parameters for non-Gaussian likelihoods
are also estimated (e.g., shape parameter of a gamma or negative_binomial likelihood)
cg_max_num_it: integer (default = 1000).
Maximal number of iterations for conjugate gradient algorithms
cg_max_num_it_tridiag: integer (default = 1000).
Maximal number of iterations for conjugate gradient algorithm
when being run as Lanczos algorithm for tridiagonalization
cg_delta_conv: numeric (default = 1E-2).
Tolerance level for L2 norm of residuals for checking convergence
in conjugate gradient algorithm when being used for parameter estimation
num_rand_vec_trace: integer (default = 50).
Number of random vectors (e.g., Rademacher) for stochastic approximation of the trace of a matrix
reuse_rand_vec_trace: boolean (default = TRUE).
If true, random vectors (e.g., Rademacher) for stochastic approximations
of the trace of a matrix are sampled only once at the beginning of
the parameter estimation and reused in later trace approximations.
Otherwise they are sampled every time a trace is calculated
seed_rand_vec_trace: integer (default = 1).
Seed number to generate random vectors (e.g., Rademacher)
piv_chol_rank: integer (default = 50).
Rank of the pivoted Cholesky decomposition used as
preconditioner in conjugate gradient algorithms
cg_preconditioner_type: string .
Type of preconditioner used for conjugate gradient algorithms.
Options for non-Gaussian likelihoods and gp_approx = "vecchia":
"piv_chol_on_Sigma": (Lk * Lk^T + W^-1) as preconditioner for inverting (B^-1 * D * B^-T + W^-1),
where Lk is a low-rank pivoted Cholesky approximation for Sigma and B^-1 * D * B^-T approx= Sigma
Options for likelihood = "gaussian" and gp_approx = "full_scale_tapering":
|
vecchia_approx |
Discontinued. Use the argument gp_approx instead
|
vecchia_pred_type |
A string specifying the type of Vecchia approximation used for making predictions.
This is discontinued here. Use the function 'set_prediction_data' to specify this
|
num_neighbors_pred |
an integer specifying the number of neighbors for making predictions.
This is discontinued here. Use the function 'set_prediction_data' to specify this
|
offset |
A numeric vector with
additional fixed effects contributions that are added to the linear predictor (= offset).
The length of this vector needs to equal the number of training data points.
|
fixed_effects |
This is discontinued. Use the renamed equivalent argument offset instead
|