aldvmm {aldvmm}R Documentation

Fitting Adjusted Limited Dependent Variable Mixture Models

Description

The function aldvmm fits adjusted limited dependent variable mixture models of health state utilities. Adjusted limited dependent variable mixture models are finite mixtures of normal distributions with an accumulation of density mass at the limits, and a gap between 100% quality of life and the next smaller utility value. The package aldvmm uses the likelihood and expected value functions proposed by Hernandez Alava and Wailoo (2015) using normal component distributions and a multinomial logit model of probabilities of component membership.

Usage

aldvmm(
  formula,
  data,
  psi,
  ncmp = 2,
  dist = "normal",
  optim.method = NULL,
  optim.control = list(trace = FALSE),
  optim.grad = TRUE,
  init.method = "zero",
  init.est = NULL,
  init.lo = NULL,
  init.hi = NULL,
  se.fit = FALSE,
  level = 0.95
)

Arguments

formula

an object of class "formula" with a symbolic description of the model to be fitted. The model formula takes the form y ~ x1 + x2 | x1 + x4, where the | delimiter separates the model for expected values of normal distributions (left) and the multinomial logit model of probabilities of component membership (right).

data

a data frame, list or environment (or object coercible to a data frame by
as.data.frame) including data on outcomes and explanatory variables in 'formula'.

psi

a numeric vector of minimum and maximum possible utility values smaller than or equal to 1 (e.g. c(-0.594, 0.883)). The potential gap between the maximum value and 1 represents an area with zero density in the value set from which utilities were obtained. The order of the minimum and maximum limits in 'psi' does not matter.

ncmp

a numeric value of the number of components that are mixed. The default value is 2. A value of 1 represents a tobit model with a gap between 1 and the maximum value in 'psi'.

dist

an optional character value of the distribution used in the finite mixture. In this release, only the normal distribution is available, and the default value is set to "normal".

optim.method

an optional character value of one of the following optimr methods: "Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "nlminb", "Rcgmin", "Rvmmin" and "hjn". The default method is "Nelder-Mead". The method "L-BFGS-B" is used when lower and/or upper constraints are set using 'init.lo' and 'init.hi'. The method "nlm" cannot be used in the 'aldvmm' package.

optim.control

an optional list of optimr control parameters.

optim.grad

an optional logical value indicating if a numerical gradient should be used in optimr methods that can use this information. The default value is TRUE. If 'optim.grad' is set to FALSE, a finite difference approximation is used.

init.method

an optional character value indicating the method for obtaining initial values. The following values are available: "zero", "random", "constant" and "sann". The default value is "zero".

init.est

an optional numeric vector of user-defined initial values. User-defined initial values override the 'init.method' argument. Initial values have to follow the same order as parameter estimates in the return value 'par'.

init.lo

an optional numeric vector of user-defined lower limits for constrained optimization. When 'init.lo' is not NULL, the optimization method "L-BFGS-B" is used. Lower limits of parameters have to follow the same order as parameter estimates in the return value 'par'.

init.hi

an optional numeric vector of user-defined upper limits for constrained optimization. When 'init.hi' is not NULL, the optimization method "L-BFGS-B" is used. Upper limits of parameters have to follow the same order as parameter estimates in the return value 'par'.

se.fit

an optional logical value indicating whether standard errors of fitted values are calculated. The default value is FALSE.

level

a numeric value of the significance level for confidence bands of fitted values. The default value is 0.95.

Details

aldvmm fits an adjusted limited dependent variable mixture model using the likelihood and expected value functions from Hernandez Alava and Wailoo (2015). The model accounts for latent classes, multi-modality, minimum and maximum utility values and potential gaps between 1 and the next smaller utility value. Adjusted limited dependent variable mixture models combine multiple component distributions with a multinomial logit model of the probabilities of component membership. The standard deviations of normal distributions are estimated and reported as log-transformed values which enter the likelihood function as exponentiated values to ensure non-negative values.

The minimum utility and the largest utility smaller than or equal to 1 are supplied in the argument 'psi'. The number of distributions/components that are mixed is set by the argument 'ncmp'. When 'ncmp' is set to 1 the procedure estimates a tobit model with a gap between 1 and the maximum utility value in 'psi'. The current version only allows finite mixtures of normal distributions.

The 'formula' object can include a | delimiter to separate formulae for expected values in components (left) and the multinomial logit model of probabilities of group membership (right). If no | delimiter is used, the same formula will be used for expected values in components and the multinomial logit of the probabilities of component membership.

aldvmm uses optimr for maximum likelihood estimation of model parameters. The argument 'optim.method' accepts the following methods: "Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "nlminb", "Rcgmin", "Rvmmin" and "hjn". The default method is "Nelder-Mead". The method "nlm" cannot be used in aldvmm because it requires a different implementation of the likelihood function. The argument 'optim.control' accepts a list of optimr control parameters. If 'optim.grad' is set to TRUE the function optimr uses numerical gradients during the optimization procedure for all methods that allow for this approach. If 'optim.grad' is set to FALSE or a method cannot use gradients, a finite difference approximation is used. The numerical gradients of the likelihood function are approximated numerically using the function grad. The hessian matrix at maximum likelihood parameters is approximated numerically using hessian.

'init.method' accepts four values of methods for generating initial values: "zero", "random", "constant", "sann". The method "zero" sets initial values of all parameters to 0. The method "random" draws random starting values from a standard normal distribution. The method "constant" estimates a constant-only model and uses estimates as initial values of intercepts and standard errors and 0 for all other prameters. The method "sann" estimates the full model using the simulated annealing optimization method in optim and uses parameter estimates as initial values. When user-specified initial values are supplied in 'init.est', the argument 'init.method' is ignored.

By default, aldvmm performs unconstrained optimization with upper and lower limits at -Inf and Inf. When user-defined lower and upper limits are supplied to 'init.lo' and/or 'init-hi', these default limits are replaced with the user-specified values, and the method "L-BFGS-B" is used for box-constrained optimization instead of the user defined 'optim.method'. It is possible to only set either maximum or minimum limits.

If 'se.fit' is set to TRUE, standard errors of fitted values are calculated using the delta method. The standard errors of fitted values in the estimation data set are calculated as se_fit = (t(grad)*Σ*grad)^0.5, where G is the gradient of a fitted value with respect to changes of parameter estimates, and Σ is the estimated covariance matrix of parameters (Dowd et al., 2014). The standard errors of predicted values in new data sets are calculated as se_pred = (mse + t(grad)*Σ*grad)^0.5, where mse is the mean squared error of fitted versus observed outcomes in the original estimation data (Whitmore, 1986).

Value

aldvmm returns an object of class inheriting from "aldvmm". An object of class "aldvmm" is a list containing the following objects.

coef

a numeric vector of parameter estimates.

se

a numeric vector of standard errors of parameter estimates.

z

a numeric vector of standardized parameter estimates.

p

a numeric vector of p-values of parameter estimates.

lower

a numeric vector of 95% lower confidence limits of parameter estimates.

upper

a numeric vector of 95% upper confidence limits of parameter estimates.

hessian

a numeric matrix object with second partial derivatives of the likelihood function.

cov

a numeric matrix object with covariances of parameters.

n

a scalar representing the number of complete observations with no missing values that were used in the estimation.

k

a scalar representing the number of components that were mixed.

gof

a list including the following elements.

ll

a numeric value of the negative log-likelihood -ll.

aic

a numeric value of the Akaike information criterion AIC = 2*npar - 2*ll.

bic

a numeric value of the Bayesian information criterion BIC = npar*log(nobs) - 2*ll.

mse

a numeric value of the mean squared error ∑{(y - \hat{y})^2}/(nobs - npar).

mae

a numeric value of the mean absolute error ∑{|y - \hat{y}|}/(nobs - npar).

pred

a list including the following elements.

y

a numeric vector of observed outcomes in 'data'.

yhat

a numeric vector of fitted values.

res

a numeric vector of residuals.

se.fit

a numeric vector of the standard error of fitted values.

lower.fit

a numeric vector of 95% lower confidence limits of fitted values.

upper.fit

a numeric vector of 95% upper confidence limits of fitted values

prob

a numeric vector expected values of the probabilities of group membership.

init

a list including the following elements.

est

a numeric vector of initial parameter estimates.

lo

a numeric vector of lower limits of parameter estimates.

hi

a numeric vector of upper limits of parameter estimates.

formula

an object of class formula supplied to argument 'formula'.

psi

a numeric vector with the minimum and maximum utility below 1 in 'data'.

dist

a character value indicating the used distribution.

label

a list including the following elements.

lcoef

a character vector of labels for objects including results on distributions (default "beta") and the probabilities of component membership (default "delta").

lcpar

a character vector of labels for objects including constant distribution parameters (default "sigma" for dist = "normal").

lcmp

a character value of the label for objects including results on different components (default "Comp")

lvar

a list including 2 character vectors of covariate names for model parameters of distributions ("beta") and the multinomial logit ("delta").

optim.method

a character value of the used optimr method.

The generic function summary can be used to obtain or print a summary of the results. The generic function predict can be used to obtain predicted values and standard errors of predictions in new data.

References

Alava, M. H. and Wailoo, A. (2015) Fitting adjusted limited dependent variable mixture models to EQ-5D. The Stata Journal, 15(3), 737–750. doi: 10.1177/1536867X1501500307

Dowd, B. E., Greene, W. H., and Norton, E. C. (2014) Computation of standard errors. Health services research, 49(2), 731–750. doi: 10.1111/1475-6773.12122

Whitmore, G. A. (1986) Prediction limits for a univariate normal observation. The American Statistician, 40(2), 141–143. doi: 10.1080/00031305.1986.10475378

Examples

data(utility)

 fit <- aldvmm(eq5d ~ age + female | 1,
               data = utility,
               psi = c(0.883, -0.594),
               ncmp = 2)

 summary(fit)

 yhat <- predict(fit,
                 newdata = utility)


[Package aldvmm version 0.8.4 Index]