R: A "simulate" method for a glm object

simulate.glm {Ecfun}

R Documentation

A "simulate" method for a glm object

Description

Simulate predictions for newdata for a model of class glm with mean coef(object) and variance vcov(object).

NOTES: The stats package has a simulate method for "lm" objects which is used for lm and glm objects. It differs from the current simulate.glm function in two fundamental and important ways:

stats::simulate returns simulated data consistent with the model fit assuming the estimated model parameters are true and exact, i.e., ignoring the uncertainty in parameter estimation. Thus, if family = poisson, stats::simulate returns nonnegative integers.

By contrast the simulate.glm function documented here returns optionally simulated coef (coefficients) plus simulated values for the link and / or response but currently NOT pseudo-random numbers on the scale of the response.
The simulate.glm function documented here also accepts an optional newdata argument, not accepted by stats::simulate. The stats::simulate function only returns simulated values for the cases in the training set with no possibilities for use for different sets of conditions.

Usage

## S3 method for class 'glm'
simulate(object, nsim = 1, 
    seed = NULL, newdata=NULL, 
    type = c("coef", "link", "response"), ...)

Arguments

`object`	an object representing a fitted model of class `glm`.
`nsim`	number of response vectors to simulate. Defaults to 1.
`seed`	Argument passed as the first argument to `set.seed` if not NULL.
`newdata`	optionally, a `data.frame` in which to look for variables with which to predict. If omitted, predictors used in fitting are used.
`type`	the type of simulations required. coef Simulated coefficents using `mvtnorm::rmvnorm(nsim, coef(object), vcov(object))`. link The default `type`='link' is on the scale of the linear predictors using `rmvnorm` applied to randomly selected components of the mixture with `mean` = `coef` and `sigma` = `vcov` for that component. For a default binomial model, these are of log-odds (probabilities on logit scale). response `object[['linkinv']]` of `type = 'link'`. For a binomial model, these are predicted probabilities.
`...`	further arguments passed to or from other methods.

Details

1. Save current seed and optionally set it using code copied from stats:::simulate.lm.

2. if(is.null(newdata))newdata gets the data used in the call to glm.

3. newMat <- model.matrix(~., newdata)

4. simCoef <- mvtnorm::rmvnorm(nsim, coef(object), vcov(object))

5. sims <- tcrossprod(newMat, simCoef)

6. If length(type) == 1: return a data.frame with one column for each desired simulation, consistent with the behavior of the generic simulate applied to objects of class lm or glm. Otherwise, return a list of data.frames of the desired types.

Value

Returns either a data.frame or a list of data.frames depending on 'type':

`coef`	a `data.frame` with `nsim` columns giving simulated parameters generated using `mvtnorm::rmvnorm(nsim, coef(object), vcov(object))`.
`link`	a `data.frame` with `nsim` columns of `nobs` values each giving the simulations on the `link` scale by applying each set of simulated coefficients to `newdata` (or to the training set of `newdata` is not supplied).
`response`	a `data.frame` with `nsim` columns of `nobs` values each giving the simulations on the `response` scale, being `linkinv` of the simulations on the `link` scale.
`if length(type)>1`	a list with simulations on the desired scales.

The value also has an attribute "seed". If argument seed is NULL, the attribute is the value of .Random.seed before the simulation started. Otherwise it is the value of the argument with a kind attribute with value as.list(RNGkind()).

NOTE: This function currently may not work with a model fit that involves a multivariate link or response.

Author(s)

Spencer Graves

Examples

library(mvtnorm)
##
## 1.  a factor and a numeric 
##
PoisReg2 <- data.frame(y=1:6, 
    x=factor(rep(0:2, 2)), x1=rep(1:2, e=3))
GLMpoisR2 <- glm(y~x+x1, poisson, PoisReg2)

newDat. <- data.frame(
  x=factor(rep(c(0, 2), 2), levels=0:2), 
  x1=3:6)
# NOTE:  Force newDat2['x'] to have the same levels
# as PoisReg2['x']

GLMsim2n <- simulate(GLMpoisR2, nsim=3, seed=2,
  newdata=newDat.)

##
## 2.  One variable:  BMA returns
##     a mixture of constant & linear models
##
PoisRegDat <- data.frame(x=1:2, y=c(5, 10))
GLMex <- glm(y~x, poisson, PoisRegDat)

# Simulate for the model data 
GLMsig <- simulate(GLMex, nsim=2, seed=1)  

# Simulate for new data
newDat <- data.frame(x=3:4, 
      row.names=paste0('f', 3:4))
GLMsio <- simulate(GLMex, nsim=3, seed=2, 
                      newdata=newDat)

##
## 2a.  Compute the correct answers manually 
##
newMat <- model.matrix(~., newDat)
RNGstate <- structure(2, kind = as.list(RNGkind()))
set.seed(2)

sim <- mvtnorm::rmvnorm(3, coef(GLMex), 
                         vcov(GLMex))
rownames(sim) <- paste0('sim_', 1:3)
simDF <- data.frame(t(sim))

GLMsim.l <- tcrossprod(newMat, sim)
colnames(GLMsim.l) <- paste0('sim_', 1:3)
GLMsim.r <- exp(GLMsim.l) 
GLMsim2 <- list(coef=simDF, 
  link=data.frame(GLMsim.l), 
  response=data.frame(GLMsim.r) )
attr(GLMsim2, 'seed') <- RNGstate  


all.equal(GLMsio, GLMsim2)

[Package Ecfun version 0.3-2 Index]