R: A GLMM Model

Model {glmmrBase}

R Documentation

A GLMM Model

Description

A GLMM Model

Details

A generalised linear mixed model and a range of associated functions

A detailed vingette for this package is available onlinedoi:10.48550/arXiv.2303.12657. Briefly, for the generalised linear mixed model

Y \sim F(\mu,\sigma)

\mu = h^-1(X\beta + Zu)

u \sim MVN(0,D)

where h is the link function. The class provides access to all of the matrices above and associated calculations and functions including model fitting, power analysis, and various relevant decompositions. The object is an R6 class and so can serve as a parent class for extended functionality.

Many calculations use the covariance matrix of the observations, such as the information matrix, which is used in power calculations and other functions. For non-Gaussian models, the class uses the first-order approximation proposed by Breslow and Clayton (1993) based on the marginal quasilikelihood:

\Sigma = W^{-1} + ZDZ^T

where W is a diagonal matrix with the GLM iterated weights for each observation equal to, for individual i \left( \frac{(\partial h^{-1}(\eta_i))}{\partial \eta_i}\right) ^2 Var(y|u) (see Table 2.1 in McCullagh and Nelder (1989)). The modification proposed by Zegers et al to the linear predictor to improve the accuracy of approximations based on the marginal quasilikelihood is also available, see use_attenuation().

See glmmrBase for a detailed guide on model specification.

The class also includes model fitting with Markov Chain Monte Carlo Maximum Likelihood implementing the algorithms described by McCulloch (1997), and fast model fitting using Laplace approximation. Functions for returning related values such as the log gradient, log probability, and other matrices are also available.

Attenuation For calculations such as the information matrix, the first-order approximation to the covariance matrix proposed by Breslow and Clayton (1993), described above, is used. The approximation is based on the marginal quasilikelihood. Zegers, Liang, and Albert (1988) suggest that a better approximation to the marginal mean is achieved by "attenuating" the linear predictor. Setting use equal to TRUE uses this adjustment for calculations using the covariance matrix for non-linear models.

Calls the respective print methods of the linked covariance and mean function objects.

The matrices X and Z both have n rows, where n is the number of observations in the model/design.

Using update_parameters() is the preferred way of updating the parameters of the mean or covariance objects as opposed to direct assignment, e.g. self$covariance$parameters <- c(...). The function calls check functions to automatically update linked matrices with the new parameters.

Stochastic maximum likelihood Fits generalised linear mixed models using one of several algorithms: Markov Chain Newton Raphson (MCNR), Markov Chain Expectation Maximisation (MCEM), or stochastic approximation expectation maximisation (SAEM) with or without Polyak-Ruppert averaging. MCNR and MCEM are described by McCulloch (1997) doi:10.1080/01621459.1997.10473613. For each iteration of the algorithms the unobserved random effect terms (\gamma) are simulated using Markov Chain Monte Carlo (MCMC) methods, and then these values are conditioned on in the subsequent steps to estimate the covariance parameters and the mean function parameters (\beta). SAEM uses a Robbins-Munroe approach to approximating the likelihood and requires fewer MCMC samples and may have lower Monte Carlo error, see Jank (2006)doi:10.1198/106186006X157469. The option alpha determines the rate at which succesive iterations "forget" the past and must be between 0.5 and 1. Higher values will result in lower Monte Carlo error but slower convergence. The options mcem.adapt and mcnr.adapt will modify the number of MCMC samples during each step of model fitting using the suggested values in Caffo, Jank, and Jones (2006)doi:10.1111/j.1467-9868.2005.00499.x as the estimates converge.

The accuracy of the algorithm depends on the user specified tolerance. For higher levels of tolerance, larger numbers of MCMC samples are likely need to sufficiently reduce Monte Carlo error. However, the SAEM approach does overcome reduce the required samples, especially with R-P averaging. As such a lower number (20-50) samples per iteration is normally sufficient to get convergence.

There are several stopping rules for the algorithm. Either the algorithm will terminate when succesive parameter estimates are all within a specified tolerance of each other (conv.criterion = 1), or when there is a high probability that the estimated log-likelihood has not been improved. This latter criterion can be applied to either the overall log-likelihood (conv.criterion = 2), the likelihood just for the fixed effects (conv.criterion = 3), or both the likelihoods for the fixed effects and covariance parameters (conv.criterion = 4; default).

Options for the MCMC sampler are set by changing the values in self$mcmc_options. The information printed to the console during model fitting can be controlled with the self$set_trace() function.

To provide weights for the model fitting, store them in self$weights. To set the number of trials for binomial models, set self$trials.

Laplace approximation Fits generalised linear mixed models using Laplace approximation to the log likelihood. For non-Gaussian models the covariance matrix is approximated using the first order approximation based on the marginal quasilikelihood proposed by Breslow and Clayton (1993). The marginal mean in this approximation can be further adjusted following the proposal of Zeger et al (1988), use the member function use_attenuated() in this class, see Model. To provide weights for the model fitting, store them in self$weights. To set the number of trials for binomial models, set self$trials. To control the information printed to the console during model fitting use the self$set_trace() function.

Public fields

covariance

A Covariance object defining the random effects covariance.

mean

A MeanFunction object, defining the mean function for the model, including the data and covariate design matrix X.

family

One of the family function used in R's glm functions. See family for details

weights

A vector indicting the weights for the observations.

trials

For binomial family models, the number of trials for each observation. The default is 1 (bernoulli).

formula

The formula for the model. May be empty if separate formulae are specified for the mean and covariance components.

var_par

Scale parameter required for some distributions (Gaussian, Gamma, Beta).

mcmc_options

There are five options for MCMC sampling that are specified in this list:

warmup The number of warmup iterations. Note that if using the internal HMC sampler, this only applies to the first iteration of the MCML algorithm, as the values from the previous iteration are carried over.
samps The number of MCMC samples drawn in the MCML algorithms. For smaller tolerance values larger numbers of samples are required. For the internal HMC sampler, larger numbers of samples are generally required than if using Stan since the samples generally exhibit higher autocorrealtion, especially for more complex covariance structures. For SAEM a small number is recommended as all samples are stored and used from every iteration.
lambda (Only relevant for the internal HMC sampler) Value of the trajectory length of the leapfrog integrator in Hamiltonian Monte Carlo (equal to number of steps times the step length). Larger values result in lower correlation in samples, but require larger numbers of steps and so is slower. Smaller numbers are likely required for non-linear GLMMs.
refresh How frequently to print to console MCMC progress if displaying verbose output.
maxsteps (Only relevant for the internal HMC sampler) Integer. The maximum number of steps of the leapfrom integrator

Methods

Public methods

Model$use_attenuation()
Model$fitted()
Model$residuals()
Model$predict()
Model$new()
Model$print()
Model$n()
Model$subset_rows()
Model$sim_data()
Model$update_parameters()
Model$information_matrix()
Model$sandwich()
Model$small_sample_correction()
Model$box()
Model$power()
Model$w_matrix()
Model$dh_deta()
Model$Sigma()
Model$MCML()
Model$LA()
Model$sparse()
Model$mcmc_sample()
Model$gradient()
Model$partial_sigma()
Model$u()
Model$log_likelihood()
Model$calculator_instructions()
Model$marginal()
Model$update_y()
Model$set_trace()
Model$clone()

Method `use_attenuation()`

Sets the model to use or not use "attenuation" when calculating the first-order approximation to the covariance matrix.

Usage

Model$use_attenuation(use)

Arguments

use: Logical indicating whether to use "attenuation".

Returns

None. Used for effects.

Method `fitted()`

Return fitted values. Does not account for the random effects. For simulated values based on resampling random effects, see also sim_data(). To predict the values including random effects at a new location see also predict().

Usage

Model$fitted(type = "link", X, u, sample = FALSE, sample_n = 100)

Arguments

type: One of either "link" for values on the scale of the link function, or "response" for values on the scale of the response
X: (Optional) Fixed effects matrix to generate fitted values
u: (Optional) Random effects values at which to generate fitted values
sample: Logical. If TRUE then the parameters will be re-sampled from their sampling distribution. Currently only works with existing X matrix and not user supplied matrix X and this will also ignore any provided random effects.
sample_n: Integer. If sample is TRUE, then this is the number of samples.

Returns

Fitted values as either a vector or matrix depending on the number of samples

Method `residuals()`

Generates the residuals for the model

Generates one of several types of residual for the model. If conditional = TRUE then the residuals include the random effects, otherwise only the fixed effects are included. For type, there are raw, pearson, and standardized residuals. For conditional residuals a matrix is returned with each column corresponding to a sample of the random effects.

Usage

Model$residuals(type = "standardized", conditional = TRUE)

Arguments

type: Either "standardized", "raw" or "pearson"
conditional: Logical indicating whether to condition on the random effects (TRUE) or not (FALSE)

Returns

A matrix with either one column is conditional is false, or with number of columns corresponding to the number of MCMC samples.

Method `predict()`

Generate predictions at new values

Generates predicted values using a new data set to specify covariance values and values for the variables that define the covariance function. The function will return a list with the linear predictor, conditional distribution of the new random effects term conditional on the current estimates of the random effects, and some simulated values of the random effects if requested.

Usage

Model$predict(newdata, offset = rep(0, nrow(newdata)), m = 0)

Arguments

newdata: A data frame specifying the new data at which to generate predictions
offset: Optional vector of offset values for the new data
m: Number of samples of the random effects to draw

Returns

A list with the linear predictor, parameters (mean and covariance matrices) for the conditional distribution of the random effects, and any random effect samples.

Method `new()`

Create a new Model object. Typically, a model is generated from a formula and data. However, it can also be generated from a previous model fit.

Usage

Model$new(
  formula,
  covariance,
  mean,
  data = NULL,
  family = NULL,
  var_par = NULL,
  offset = NULL,
  weights = NULL,
  trials = NULL,
  model_fit = NULL
)

Arguments

formula: A model formula containing fixed and random effect terms. The formula can be one way (e.g. ~ x + (1|gr(cl))) or two-way (e.g. y ~ x + (1|gr(cl))). One way formulae will generate a valid model enabling data simulation, matrix calculation, and power, etc. Outcome data can be passed directly to model fitting functions, or updated later using member function update_y(). For binomial models, either the syntax cbind(y, n-y) can be used for outcomes, or just y and the number of trials passed to the argument trials described below.
covariance: (Optional) Either a Covariance object, an equivalent list of arguments that can be passed to Covariance to create a new object, or a vector of parameter values. At a minimum the list must specify a formula. If parameters are not included then they are initialised to 0.5.
mean: (Optional) Either a MeanFunction object, an equivalent list of arguments that can be passed to MeanFunction to create a new object, or a vector of parameter values. At a minimum the list must specify a formula. If parameters are not included then they are initialised to 0.
data: A data frame with the data required for the mean function and covariance objects. This argument can be ignored if data are provided to the covariance or mean arguments either via Covariance and MeanFunction object, or as a member of the list of arguments to both covariance and mean.
family: A family object expressing the distribution and link function of the model, see family. This argument is optional if the family is provided either via a MeanFunction or MeanFunction objects, or as members of the list of arguments to mean. Current accepts binomial, gaussian, Gamma, poisson, and Beta.
var_par: (Optional) Scale parameter required for some distributions, including Gaussian. Default is NULL.
offset: (Optional) A vector of offset values. Optional - could be provided to the argument to mean instead.
weights: (Optional) A vector of weights.
trials: (Optional) For binomial family models, the number of trials for each observation. If it is not set, then it will default to 1 (a bernoulli model).
model_fit: (optional) A mcml model fit resulting from a call to MCML or LA

Returns

A new Model class object

Examples

\dontshow{
setParallel(FALSE) 
}
#create a data frame describing a cross-sectional parallel cluster
#randomised trial
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
mod <- Model$new(
  formula = ~ factor(t) + int - 1 + (1|gr(cl)) + (1|gr(cl,t)),
  data = df,
  family = stats::gaussian()
)

#here we will specify a cohort study and provide parameter values
df <- nelder(~ind(20) * t(6))
df$int <- 0
df[df$t > 3, 'int'] <- 1

des <- Model$new(
  formula = ~ int + (1|gr(ind)),
  data = df,
  family = stats::poisson()
)
  
  # or with parameter values specified
  
des <- Model$new(
  formula = ~ int + (1|gr(ind)),
  covariance = c(0.05),
  mean = c(1,0.5),
  data = df,
  family = stats::poisson()
)

#an example of a spatial grid with two time points
df <- nelder(~ (x(10)*y(10))*t(2))
spt_design <- Model$new(formula = ~ 1 + (1|ar0(t)*fexp(x,y)),
                        data = df,
                        family = stats::gaussian())

Method `print()`

Print method for Model class

Usage

Model$print()

Arguments

...: ignored

Method `n()`

Returns the number of observations in the model

Usage

Model$n(...)

Arguments

...: ignored

Method `subset_rows()`

Subsets the design keeping specified observations only

Given a vector of row indices, the corresponding rows will be kept and the other rows will be removed from the mean function and covariance

Usage

Model$subset_rows(index)

Arguments

index: Integer or vector integers listing the rows to keep

Returns

The function updates the object and nothing is returned.

Method `sim_data()`

Generates a realisation of the design

Generates a single vector of outcome data based upon the specified GLMM design.

Usage

Model$sim_data(type = "y")

Arguments

type: Either 'y' to return just the outcome data, 'data' to return a data frame with the simulated outcome data alongside the model data, or 'all', which will return a list with simulated outcomes y, matrices X and Z, parameters beta, and the values of the simulated random effects.

Returns

Either a vector, a data frame, or a list

Examples

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
\dontshow{
setParallel(FALSE) # for the CRAN check
}
des <- Model$new(
  formula = ~ factor(t) + int - 1 + (1|gr(cl)*ar0(t)),
  covariance = c(0.05,0.8),
  mean = c(rep(0,5),0.6),
  data = df,
  family = stats::binomial()
)
ysim <- des$sim_data()

Method `update_parameters()`

Updates the parameters of the mean function and/or the covariance function

Usage

Model$update_parameters(mean.pars = NULL, cov.pars = NULL, var.par = NULL)

Arguments

mean.pars: (Optional) Vector of new mean function parameters
cov.pars: (Optional) Vector of new covariance function(s) parameters
var.par: (Optional) A scalar value for var_par

Examples

\dontshow{
setParallel(FALSE) # for the CRAN check
}
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Model$new(
  formula = ~ factor(t) + int - 1 + (1|gr(cl)*ar0(t)),
  data = df,
  family = stats::binomial()
)
des$update_parameters(cov.pars = c(0.1,0.9))

Method `information_matrix()`

Generates the information matrix of the mixed model GLS estimator (X'S^-1X). The inverse of this matrix is an estimator for the variance-covariance matrix of the fixed effect parameters. For various small sample corrections see small_sample_correction() and box(). For models with non-linear functions of fixed effect parameters, a correction to the Hessian matrix is required, which is automatically calculated or optionally returned or disabled.

Usage

Model$information_matrix(
  include.re = FALSE,
  theta = FALSE,
  hessian.corr = "add",
  adj.nonspd = TRUE
)

Arguments

include.re: logical indicating whether to return the information matrix including the random effects components (TRUE), or the mixed model information matrix for beta only (FALSE).
theta: Logical. If TRUE the function will return the variance-coviariance matrix for the covariance parameters and ignore the first argument. Otherwise, the fixed effect parameter information matrix is returned.
hessian.corr: String. If there are non-linear functions of fixed effect parameters then a correction to the Hessian can be applied ("add"), returned on its own ("return"), or ignored ("none")
adj.nonspd: Logical. For models nonlinear in fixed effect parameters, the Hessian of the linear predictor with respect to the model parameters is often not positive semi-definite, which can cause a singular matrix and failure to calculate valid standard errors. The adjustment matrix will be corrected if it is not positive semi-definite and this option is TRUE.

Returns

A matrix

Method `sandwich()`

Returns the robust sandwich variance-covariance matrix for the fixed effect parameters

Usage

Model$sandwich()

Returns

A PxP matrix

Method `small_sample_correction()`

Returns a small sample correction. The option "KR" returns the Kenward-Roger bias-corrected variance-covariance matrix for the fixed effect parameters and degrees of freedom. Option "KR2" returns an improved correction given in Kenward & Roger (2009) doi:j.csda.2008.12.013. Note, that the corrected/improved version is invariant under reparameterisation of the covariance, and it will also make no difference if the covariance is linear in parameters. Exchangeable covariance structures in this package (i.e. gr()) are parameterised in terms of the variance rather than standard deviation, so the results will be unaffected. Option "sat" returns the "Satterthwaite" correction, which only includes corrected degrees of freedom, along with the GLS standard errors.

Usage

Model$small_sample_correction(type)

Arguments

type: Either "KR", "KR2", or "sat", see description.

Returns

A PxP matrix

Method `box()`

Returns the inferential statistics (F-stat, p-value) for a modified Box correction doi:10.1002/sim.4072 for Gaussian-identity models.

Usage

Model$box(y)

Arguments

y: Optional. If provided, will update the vector of outcome data. Otherwise it will use the data from the previous model fit.

Returns

A data frame.

Method `power()`

Estimates the power of the design described by the model using the square root of the relevant element of the GLS variance matrix:

(X^T\Sigma^{-1}X)^{-1}

Note that this is equivalent to using the "design effect" for many models.

Usage

Model$power(alpha = 0.05, two.sided = TRUE, alternative = "pos")

Arguments

alpha: Numeric between zero and one indicating the type I error rate. Default of 0.05.
two.sided: Logical indicating whether to use a two sided test
alternative: For a one-sided test whether the alternative hypothesis is that the parameter is positive "pos" or negative "neg"

Returns

A data frame describing the parameters, their values, expected standard errors and estimated power.

Examples

\dontshow{
setParallel(FALSE) # for the CRAN check
}
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Model$new(
  formula = ~ factor(t) + int - 1 + (1|gr(cl)) + (1|gr(cl,t)),
  covariance = c(0.05,0.1),
  mean = c(rep(0,5),0.6),
  data = df,
  family = stats::gaussian(),
  var_par = 1
)
des$power() #power of 0.90 for the int parameter

Method `w_matrix()`

Returns the diagonal of the matrix W used to calculate the covariance matrix approximation

Usage

Model$w_matrix()

Returns

A vector with values of the glm iterated weights

Method `dh_deta()`

Returns the derivative of the link function with respect to the linear preditor

Usage

Model$dh_deta()

Returns

A vector

Method `Sigma()`

Returns the (approximate) covariance matrix of y

Returns the covariance matrix Sigma. For non-linear models this is an approximation. See Details.

Usage

Model$Sigma(inverse = FALSE)

Arguments

inverse: Logical indicating whether to provide the covariance matrix or its inverse

Returns

A matrix.

Method `MCML()`

Stochastic Maximum Likelihood model fitting

Usage

Model$MCML(
  y = NULL,
  method = "saem",
  tol = 0.01,
  max.iter = 50,
  se = "gls",
  mcmc.pkg = "rstan",
  se.theta = TRUE,
  algo = ifelse(self$mean$any_nonlinear(), 2, 3),
  lower.bound = NULL,
  upper.bound = NULL,
  lower.bound.theta = NULL,
  upper.bound.theta = NULL,
  alpha = 0.8,
  convergence.prob = 0.95,
  pr.average = FALSE,
  conv.criterion = 2
)

Arguments

y: Optional. A numeric vector of outcome data. If this is not provided then either the outcome must have been specified when initialising the Model object, or the outcome data has been updated using member function update_y()
method: The MCML algorithm to use, either mcem or mcnr, or saem see Details. Default is saem. mcem.adapt and mcnr.adapt will use adaptive MCMC sample sizes starting small and increasing to the the maximum value specified in mcmc_options$sampling, which results in faster convergence. saem uses a stochastic approximation expectation maximisation algorithm. MCMC samples are kept from all iterations and so a smaller number of samples are needed per iteration.
tol: Numeric value, tolerance of the MCML algorithm, the maximum difference in parameter estimates between iterations at which to stop the algorithm. If two values are provided then different tolerances will be applied to the fixed effect and covariance parameters.
max.iter: Integer. The maximum number of iterations of the MCML algorithm.
se: String. Type of standard error and/or inferential statistics to return. Options are "gls" for GLS standard errors (the default), "robust" for robust standard errors, "kr" for original Kenward-Roger bias corrected standard errors, "kr2" for the improved Kenward-Roger correction, "sat" for Satterthwaite degrees of freedom correction (this is the same degrees of freedom correction as Kenward-Roger, but with GLS standard errors), "box" to use a modified Box correction (does not return confidence intervals), "bw" to use GLS standard errors with a between-within correction to the degrees of freedom, "bwrobust" to use robust standard errors with between-within correction to the degrees of freedom.
mcmc.pkg: String. Either cmdstan for cmdstan (requires the package cmdstanr), rstan to use rstan sampler, or hmc to use a cruder Hamiltonian Monte Carlo sampler. cmdstan is recommended as it has by far the best number of effective samples per unit time. cmdstanr will compile the MCMC programs to the library folder the first time they are run, so may not currently be an option for some users.
se.theta: Logical. Whether to calculate the standard errors for the covariance parameters. This step is a slow part of the calculation, so can be disabled if required in larger models. Has no effect for Kenward-Roger standard errors.
algo: Integer. 1 = L-BFGS for beta and BOBYQA for theta, 2 = BOBYQA for both, 3 = L-BFGS for both (default). The L-BFGS algorithm may perform poorly with some covariance structures, in this case select 1 or 2, or apply an upper bound.
lower.bound: Optional. Vector of lower bounds for the fixed effect parameters. To apply bounds use MCEM.
upper.bound: Optional. Vector of upper bounds for the fixed effect parameters. To apply bounds use MCEM.
lower.bound.theta: Optional. Vector of lower bounds for the covariance parameters (default is 0; negative values will cause an error)
upper.bound.theta: Optional. Vector of upper bounds for the covariance parameters.
alpha: If using SAEM then this parameter controls the step size. On each iteration i the step size is (1/alpha)^i, default is 0.8. Values around 0.5 will result in lower bias but slower convergence, values closer to 1 will result in higher convergence but potentially higher error.
convergence.prob: Numeric value in (0,1) indicating the probability of convergence if using convergence criteria 2, 3, or 4.
pr.average: Logical indicating whether to use Polyak-Ruppert averaging if using the SAEM algorithm (default is TRUE)
conv.criterion: Integer. The convergence criterion for the algorithm. 1 = the maximum difference between parameter estimates between iterations as defined by tol, 2 = The probability of improvement in the overall log-likelihood is less than 1 - convergence.prob 3 = The probability of improvement in the log-likelihood for the fixed effects is less than 1 - convergence.prob 4 = The probabilities of improvement in the log-likelihood the fixed effects and covariance parameters are both less than 1 - convergence.prob

Returns

A mcml object

Examples

\dontrun{
#create example data with six clusters, five time periods, and five people per cluster-period
df <- nelder(~(cl(6)*t(5)) > ind(5))
# parallel trial design intervention indicator
df$int <- 0
df[df$cl > 3, 'int'] <- 1 
# specify parameter values in the call for the data simulation below
des <- Model$new(
  formula= ~ factor(t) + int - 1 +(1|gr(cl)*ar0(t)),
  covariance = c(0.05,0.7),
  mean = c(rep(0,5),0.2),
  data = df,
  family = gaussian()
)
ysim <- des$sim_data() # simulate some data from the model
fit1 <- des$MCML(y = ysim) # Default model fitting with SAEM-PR
# use MCEM instead and stop when parameter values are within 1e-2 on successive iterations
fit2 <- des$MCML(y = ysim, method="mcem",tol=1e-2,conv.criterion = 1)
}

Method `LA()`

Maximum Likelihood model fitting with Laplace Approximation

Usage

Model$LA(
  y = NULL,
  start,
  method = "nr",
  se = "gls",
  max.iter = 40,
  tol = 1e-04,
  se.theta = TRUE,
  algo = 2,
  lower.bound = NULL,
  upper.bound = NULL,
  lower.bound.theta = NULL,
  upper.bound.theta = NULL
)

Arguments

y: Optional. A numeric vector of outcome data. If this is not provided then either the outcome must have been specified when initialising the Model object, or the outcome data has been updated using member function update_y()
start: Optional. A numeric vector indicating starting values for the model parameters.
method: String. Either "nloptim" for non-linear optimisation, or "nr" for Newton-Raphson (default) algorithm
se: String. Type of standard error and/or inferential statistics to return. Options are "gls" for GLS standard errors (the default), "robust" for robust standard errors, "kr" for original Kenward-Roger bias corrected standard errors, "kr2" for the improved Kenward-Roger correction, "sat" for Satterthwaite degrees of freedom correction (this is the same degrees of freedom correction as Kenward-Roger, but with GLS standard errors)"box" to use a modified Box correction (does not return confidence intervals), "bw" to use GLS standard errors with a between-within correction to the degrees of freedom, "bwrobust" to use robust standard errors with between-within correction to the degrees of freedom. Note that Kenward-Roger assumes REML estimates, which are not currently provided by this function.
max.iter: Maximum number of algorithm iterations, default 20.
tol: Maximum difference between successive iterations at which to terminate the algorithm
se.theta: Logical. Whether to calculate the standard errors for the covariance parameters. This step is a slow part of the calculation, so can be disabled if required in larger models. Has no effect for Kenward-Roger standard errors.
algo: Integer. 1 = L-BFGS for beta-u and BOBYQA for theta (default), 2 = BOBYQA for both.
lower.bound: Optional. Vector of lower bounds for the fixed effect parameters. To apply bounds use nloptim.
upper.bound: Optional. Vector of upper bounds for the fixed effect parameters. To apply bounds use nloptim.
lower.bound.theta: Optional. Vector of lower bounds for the covariance parameters.
upper.bound.theta: Optional. Vector of upper bounds for the covariance parameters.

Returns

A mcml object

Examples

\dontshow{
setParallel(FALSE) # for the CRAN check
}
#create example data with six clusters, five time periods, and five people per cluster-period
df <- nelder(~(cl(6)*t(5)) > ind(5))
# parallel trial design intervention indicator
df$int <- 0
df[df$cl > 3, 'int'] <- 1 
# specify parameter values in the call for the data simulation below
des <- Model$new(
  formula = ~ factor(t) + int - 1 + (1|gr(cl)*ar0(t)),
  covariance = c(0.05,0.7),
  mean = c(rep(0,5),-0.2),
  data = df,
  family = stats::binomial()
)
ysim <- des$sim_data() # simulate some data from the model
fit1 <- des$LA(y = ysim)

Method `sparse()`

Set whether to use sparse matrix methods for model calculations and fitting. By default the model does not use sparse matrix methods.

Usage

Model$sparse(sparse = TRUE, amd = TRUE)

Arguments

sparse: Logical indicating whether to use sparse matrix methods
amd: Logical indicating whether to use and Approximate Minimum Degree algorithm to calculate an efficient permutation matrix so that the Cholesky decomposition of PAP^T is calculated rather than A.

Returns

None, called for effects

Method `mcmc_sample()`

Generate an MCMC sample of the random effects

Usage

Model$mcmc_sample(mcmc.pkg = "rstan")

Arguments

mcmc.pkg: String. Either cmdstan for cmdstan (requires the package cmdstanr), rstan to use rstan sampler, or hmc to use a cruder Hamiltonian Monte Carlo sampler. cmdstan is recommended as it has by far the best number of effective samples per unit time. cmdstanr will compile the MCMC programs to the library folder the first time they are run, so may not currently be an option for some users.

Returns

A matrix of samples of the random effects

Method `gradient()`

The gradient of the log-likelihood with respect to either the random effects or the model parameters. The random effects are on the N(0,I) scale, i.e. scaled by the Cholesky decomposition of the matrix D. To obtain the random effects from the last model fit, see member function ⁠$u⁠

Usage

Model$gradient(y, u, beta = FALSE)

Arguments

y: (optional) Vector of outcome data, if not specified then data must have been set in another function.
u: (optional) Vector of random effects scaled by the Cholesky decomposition of D
beta: Logical. Whether the log gradient for the random effects (FALSE) or for the linear predictor parameters (TRUE)

Returns

A vector of the gradient

Method `partial_sigma()`

The partial derivatives of the covariance matrix Sigma with respect to the covariance parameters. The function returns a list in order: Sigma, first order derivatives, second order derivatives. The second order derivatives are ordered as the lower-triangular matrix in column major order. Letting 'd(i)' mean the first-order partial derivative with respect to parameter i, and d2(i,j) mean the second order derivative with respect to parameters i and j, then if there were three covariance parameters the order of the output would be: (sigma, d(1), d(2), d(3), d2(1,1), d2(1,2), d2(1,3), d2(2,2), d2(2,3), d2(3,3)).

Usage

Model$partial_sigma()

Returns

A list of matrices, see description for contents of the list.

Method `u()`

Returns the sample of random effects from the last model fit, or updates the samples for the model.

Usage

Model$u(scaled = TRUE, u)

Arguments

scaled: Logical indicating whether the samples are on the N(0,I) scale (scaled=FALSE) or N(0,D) scale (scaled=TRUE)
u: (optional) Matrix of random effect samples. If provided then the internal samples are replaced with these values. These samples should be N(0,I).

Returns

A matrix of random effect samples

Method `log_likelihood()`

The log likelihood for the GLMM. The random effects can be left unspecified. If no random effects are provided, and there was a previous model fit with the same data y then the random effects will be taken from that model. If there was no previous model fit then the random effects are assumed to be all zero.

Usage

Model$log_likelihood(y, u)

Arguments

y: A vector of outcome data
u: An optional matrix of random effect samples. This can be a single column.

Returns

The log-likelihood of the model parameters

Method `calculator_instructions()`

Prints the internal instructions and data used to calculate the linear predictor. Internally the class uses a reverse polish notation to store and calculate different functions, including user-specified non-linear mean functions. This function will print all the steps. Mainly used for debugging and determining how the class has interpreted non-linear model specifications.

Usage

Model$calculator_instructions()

Returns

None. Called for effects.

Method `marginal()`

Calculates the marginal effect of variable x. There are several options for marginal effect and several types of conditioning or averaging. The type of marginal effect can be the derivative of the mean with respect to x (dydx), the expected difference E(y|x=a)-E(y|x=b) (diff), or the expected log ratio log(E(y|x=a)/E(y|x=b)) (ratio). Other fixed effect variables can be set at specific values (at), set at their mean values (atmeans), or averaged over (average). Averaging over a fixed effects variable here means using all observed values of the variable in the relevant calculation. The random effects can similarly be set at their estimated value (re="estimated"), set to zero (re="zero"), set to a specific value (re="at"), or averaged over (re="average"). Estimates of the expected values over the random effects are generated using MCMC samples. MCMC samples are generated either through MCML model fitting or using mcmc_sample. In the absence of samples average and estimated will produce the same result. The standard errors are calculated using the delta method with one of several options for the variance matrix of the fixed effect parameters. Several of the arguments require the names of the variables as given to the model object. Most variables are as specified in the formula, factor variables are specified as the name of the variable_value, e.g. t_1. To see the names of the stored parameters and data variables see the member function names().

Usage

Model$marginal(
  x,
  type,
  re,
  se,
  at = c(),
  atmeans = c(),
  average = c(),
  xvals = c(1, 0),
  atvals = c(),
  revals = c()
)

Arguments

x: String. Name of the variable to calculate the marginal effect for.
type: String. Either dydx for derivative, diff for difference, or ratio for log ratio. See description.
re: String. Either estimated to condition on estimated values, zero to set to zero, at to provide specific values, or average to average over the random effects.
se: String. Type of standard error to use, either GLS for the GLS standard errors, KR for Kenward-Roger estimated standard errors, KR2 for the improved Kenward-Roger correction (see small_sample_correction()), or robust to use a robust sandwich estimator.
at: Optional. A vector of strings naming the fixed effects for which a specified value is given.
atmeans: Optional. A vector of strings naming the fixed effects that will be set at their mean value.
average: Optional. A vector of strings naming the fixed effects which will be averaged over.
xvals: Optional. A vector specifying the values of a and b for diff and ratio. The default is (1,0).
atvals: Optional. A vector specifying the values of fixed effects specified in at (in the same order).
revals: Optional. If re="at" then this argument provides a vector of values for the random effects.

Returns

A named vector with elements margin specifying the point estimate and se giving the standard error.

Method `update_y()`

Updates the outcome data y

Some functions require outcome data, which is by default set to all zero if no model fitting function has been run. This function can update the interval y data.

Usage

Model$update_y(y)

Arguments

y: Vector of outcome data

Returns

None. Called for effects

Method `set_trace()`

Controls the information printed to the console for other functions.

Usage

Model$set_trace(trace)

Arguments

trace: Integer, either 0 = no information, 1 = some information, 2 = all information

Returns

None. Called for effects.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

Model$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Breslow, N. E., Clayton, D. G. (1993). Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association<, 88(421), 9–25. doi:10.1080/01621459.1993.10594284

McCullagh P, Nelder JA (1989). Generalized linear models, 2nd Edition. Routledge.

McCulloch CE (1997). “Maximum Likelihood Algorithms for Generalized Linear Mixed Models.” Journal of the American statistical Association, 92(437), 162–170.doi:10.2307/2291460

Zeger, S. L., Liang, K.-Y., Albert, P. S. (1988). Models for Longitudinal Data: A Generalized Estimating Equation Approach. Biometrics, 44(4), 1049.doi:10.2307/2531734

Examples


## ------------------------------------------------
## Method `Model$new`
## ------------------------------------------------


#create a data frame describing a cross-sectional parallel cluster
#randomised trial
df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
mod <- Model$new(
  formula = ~ factor(t) + int - 1 + (1|gr(cl)) + (1|gr(cl,t)),
  data = df,
  family = stats::gaussian()
)

#here we will specify a cohort study and provide parameter values
df <- nelder(~ind(20) * t(6))
df$int <- 0
df[df$t > 3, 'int'] <- 1

des <- Model$new(
  formula = ~ int + (1|gr(ind)),
  data = df,
  family = stats::poisson()
)
  
  # or with parameter values specified
  
des <- Model$new(
  formula = ~ int + (1|gr(ind)),
  covariance = c(0.05),
  mean = c(1,0.5),
  data = df,
  family = stats::poisson()
)

#an example of a spatial grid with two time points
df <- nelder(~ (x(10)*y(10))*t(2))
spt_design <- Model$new(formula = ~ 1 + (1|ar0(t)*fexp(x,y)),
                        data = df,
                        family = stats::gaussian())

## ------------------------------------------------
## Method `Model$sim_data`
## ------------------------------------------------

df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1

des <- Model$new(
  formula = ~ factor(t) + int - 1 + (1|gr(cl)*ar0(t)),
  covariance = c(0.05,0.8),
  mean = c(rep(0,5),0.6),
  data = df,
  family = stats::binomial()
)
ysim <- des$sim_data()

## ------------------------------------------------
## Method `Model$update_parameters`
## ------------------------------------------------


df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Model$new(
  formula = ~ factor(t) + int - 1 + (1|gr(cl)*ar0(t)),
  data = df,
  family = stats::binomial()
)
des$update_parameters(cov.pars = c(0.1,0.9))

## ------------------------------------------------
## Method `Model$power`
## ------------------------------------------------


df <- nelder(~(cl(10)*t(5)) > ind(10))
df$int <- 0
df[df$cl > 5, 'int'] <- 1
des <- Model$new(
  formula = ~ factor(t) + int - 1 + (1|gr(cl)) + (1|gr(cl,t)),
  covariance = c(0.05,0.1),
  mean = c(rep(0,5),0.6),
  data = df,
  family = stats::gaussian(),
  var_par = 1
)
des$power() #power of 0.90 for the int parameter

## ------------------------------------------------
## Method `Model$MCML`
## ------------------------------------------------

## Not run: 
#create example data with six clusters, five time periods, and five people per cluster-period
df <- nelder(~(cl(6)*t(5)) > ind(5))
# parallel trial design intervention indicator
df$int <- 0
df[df$cl > 3, 'int'] <- 1 
# specify parameter values in the call for the data simulation below
des <- Model$new(
  formula= ~ factor(t) + int - 1 +(1|gr(cl)*ar0(t)),
  covariance = c(0.05,0.7),
  mean = c(rep(0,5),0.2),
  data = df,
  family = gaussian()
)
ysim <- des$sim_data() # simulate some data from the model
fit1 <- des$MCML(y = ysim) # Default model fitting with SAEM-PR
# use MCEM instead and stop when parameter values are within 1e-2 on successive iterations
fit2 <- des$MCML(y = ysim, method="mcem",tol=1e-2,conv.criterion = 1)

## End(Not run)

## ------------------------------------------------
## Method `Model$LA`
## ------------------------------------------------


#create example data with six clusters, five time periods, and five people per cluster-period
df <- nelder(~(cl(6)*t(5)) > ind(5))
# parallel trial design intervention indicator
df$int <- 0
df[df$cl > 3, 'int'] <- 1 
# specify parameter values in the call for the data simulation below
des <- Model$new(
  formula = ~ factor(t) + int - 1 + (1|gr(cl)*ar0(t)),
  covariance = c(0.05,0.7),
  mean = c(rep(0,5),-0.2),
  data = df,
  family = stats::binomial()
)
ysim <- des$sim_data() # simulate some data from the model
fit1 <- des$LA(y = ysim)

[Package glmmrBase version 0.9.2 Index]

A GLMM Model

Description

Details

Public fields

Methods

Public methods

Method use_attenuation()

Usage

Arguments

Returns

Method fitted()

Usage

Arguments

Returns

Method residuals()

Usage

Arguments

Returns

Method predict()

Usage

Arguments

Returns

Method new()

Usage

Arguments

Returns

Examples

Method print()

Usage

Arguments

Method n()

Usage

Arguments

Method subset_rows()

Usage

Arguments

Returns

Method sim_data()

Usage

Arguments

Returns

Examples

Method update_parameters()

Usage

Arguments

Examples

Method information_matrix()

Usage

Arguments

Returns

Method sandwich()

Usage

Returns

Method small_sample_correction()

Usage

Arguments

Returns

Method box()

Usage

Arguments

Returns

Method power()

Usage

Arguments

Returns

Examples

Method w_matrix()

Usage

Returns

Method dh_deta()

Usage

Returns

Method Sigma()

Usage

Arguments

Returns

Method MCML()

Usage

Arguments

Returns

Method `use_attenuation()`

Method `fitted()`

Method `residuals()`

Method `predict()`

Method `new()`

Method `print()`

Method `n()`

Method `subset_rows()`

Method `sim_data()`

Method `update_parameters()`

Method `information_matrix()`

Method `sandwich()`

Method `small_sample_correction()`

Method `box()`

Method `power()`

Method `w_matrix()`

Method `dh_deta()`

Method `Sigma()`

Method `MCML()`

Method `LA()`

Method `sparse()`

Method `mcmc_sample()`

Method `gradient()`

Method `partial_sigma()`

Method `u()`

Method `log_likelihood()`

Method `calculator_instructions()`

Method `marginal()`

Method `update_y()`

Method `set_trace()`

Method `clone()`