simulate.bic.glm {Ecfun}R Documentation

A "simulate" method for a BMA::bic.glm object

Description

Simulate predictions for newdata for a model of class bic.glm.

NOTES: The stats package has a simulate method for "lm" objects which is used for lm and glm objects. This simulate.bic.glm function differs from the stats::simulate function in the same two fundamental and important ways as the simulate.glm function:

  1. stats::simulate returns simulated data consistent with the model fit assuming the estimated model parameters are true and exact, i.e., ignoring the uncertainty in parameter estimation. Thus, if family = poisson, stats::simulate returns nonnegative integers.

    By contrast the simulate.bic.glm function documented here returns optionally simulated coef (coefficients) plus simulated values for the link and / or response but currently NOT pseudo-random numbers on the scale of the response.

  2. The simulate.bic.glm function documented here also accepts an optional newdata argument, not accepted by stats::simulate. The stats::simulate function only returns simulated values for the cases in the training set with no possibilities for use for different sets of conditions.

Usage

## S3 method for class 'bic.glm'
simulate(object, nsim = 1, 
    seed = NULL, newdata=NULL, 
    type = c("coef", "link", "response"), ...)

Arguments

object

an object representing a fitted model of class bic.glm.

nsim

number of response vectors to simulate. Defaults to 1.

seed

Argument passed as the first argument to set.seed if not NULL.

newdata

optionally, a data.frame in which to look for variables with which to predict. If omitted, predictors used in fitting are used.

type

the type of simulations required.

  • coef type = "coef" returns pseudo- random numbers generated by mvtnorm::rmvnorm with mean = coef and sigma = vcov for the component of the BMA mixture randomly selected for each simulation. (Obviously, this does not use newdata.)

  • link type='link' returns simulations on the scale of the linear predictors using rmvnorm applied to randomly selected components of the mixture with mean = coef and sigma = vcov for that component. For a default binomial model, these are of log-odds (probabilities on logit scale).

  • response object[['linkinv']] of type = 'link'. For a binomial model, these are predicted probabilities.

...

further arguments passed to or from other methods.

Details

1. Save current seed and optionally set it using code copied from stats:::simulate.lm.

2. postprob <- object[['postprob']]; x <- object[['x']]; y <- object[['y']]; mle <- object[['mle']]; linkinv <- object[['linkinv']].

3. cl <- as.list(object[['call']]); wt <- cl[['wt']]; fam <- cl[['glm.family']]

4. if(is.null(newdata))newdata <- x else ensure that all levels of factors of newdata match x.

5. xMat <- model.matrix(~., x); newMat <- model.matrix(~., newdata)

6. nComponents <- length(postprob); nobs <- NROW(newdata)

7. sims <- matrix(NA, nobs, nsim)

8. rmdl <- sample(1:nComponents, nsims, TRUE, postprob)

9. for(Comp in 1:nComponents) nsimComp <- sum(rmdl==Comp); refitComp <- glm.fit(xMat[, mle[Comp,]!=0], y, wt, mle[Comp, mle[Comp,]!=0], family=fam); simCoef <- mvtnorm::rmvnorm(nsimComp, coef (refitComp), vcov(rfitComp)); sims[rmdl==Comp, ] <- tcrossprod(newMat[, mle[Comp,]!=0], simCoef)

10. If length(type) == 1: return a data.frame with one column for each desired simulation, consistent with the behavior of the generic simulate applied to objects of class lm or glm. Otherwise, return a list of data.frames of the desired types.

Value

Returns either a data.frame or a list of data.frames depending on 'type':

coef

a data.frame with nsim columns and one row for each variable in the max model. Values are non-zero for variables in the model in the BMA mixture selected for that simulation. The non-zero values are generated using mvtnorm::rmvnorm with mean = coef and covariance matrix = vcov of the model fit to the subset of variables in that component model.

link

a data.frame with nsim columns of nobs values each giving the simulations on the link scale for each row in newdata (or the training set if newdata is not provided).

response

a data.frame with nsim columns of nobs values each giving the simulations on the response scale, being linkinv of the simulations on the link scale.

if length(type)>1

a list with simulations on the desired scales.

The value also has an attribute "seed". If argument seed is NULL, the attribute is the value of .Random.seed before the simulation started. Otherwise it is the value of the argument with a "kind" attribute with value as.list(RNGkind()).

NOTE: This function currently may not work with a model fit that involves a multivariate link or response.

Author(s)

Spencer Graves

See Also

simulate simulate.glm bic.glm predict.bic.glm set.seed rmvnorm

Examples

library(BMA)
library(mvtnorm)
##
## 1.  a factor and a numeric 
##
PoisReg2 <- data.frame(
  x=factor(rep(0:2, 2)), x1=rep(1:2, e=3))
bicGLM2 <- bic.glm(PoisReg2, y=1:6, poisson)

newDat2 <- data.frame(
  x=factor(rep(c(0, 2), 2), levels=0:2), 
  x1=3:6)
# NOTE:  Force newDat2['x'] to have the same levels
# as PoisReg2['x']

bicGLMsim2n <- simulate(bicGLM2, nsim=5, seed=2,
  newdata=newDat2[1:3,])

##
## 2.  One variable:  BMA returns
##     a mixture of constant & linear models
##
PoisRegDat <- data.frame(x=1:2, y=c(5, 10))
bicGLMex <- bic.glm(PoisRegDat['x'], 
                     PoisRegDat[, 'y'], poisson)
(postprob <- bicGLMex[['postprob']])
bicGLMex['mle']

# Simulate for the model data 
bicGLMsim <- simulate(bicGLMex, nsim=2, seed=1)  

# Simulate for new data
newDat <- data.frame(x=3:4, 
      row.names=paste0('f', 3:4))
bicGLMsin <- simulate(bicGLMex, nsim=3, seed=2, 
                      newdata=newDat)
                      
# Refit with bic.glm.matrix and confirm 
# that simulate returns the same answers

bicGLMat <- bic.glm(as.matrix(PoisRegDat['x']), 
                         PoisRegDat[, 'y'], poisson)
bicGLMatsim <- simulate(bicGLMat, nsim=3, seed=2, 
                      newdata=newDat)
                      
all.equal(bicGLMsin, bicGLMatsim)                      


# The same problem using bic.glm.formula                  
bicGLMfmla <- bic.glm(y ~ x, PoisRegDat, poisson)
bicGLMfmlsim <- simulate(bicGLMfmla, nsim=3, seed=2, 
                      newdata=newDat)
                      
all.equal(bicGLMsin, bicGLMfmlsim)                      

                      
##
## 2a.  Compute the correct answers manually 
##
GLMex1 <- glm(y~x, poisson, PoisRegDat)
GLMex0 <- glm(y~1, poisson, PoisRegDat)

postProb <- bicGLMfmla$postprob
nComp <- length(postProb)
newMat <- model.matrix(~., newDat)
set.seed(2)
(rmdl <- sample(1:nComp, 3, TRUE, 
          postprob))
GLMsim. <- matrix(NA, 2, 3)
dimnames(GLMsim.) <- list(
  rownames(newMat), 
  paste0('sim_', 1:3) )
          
sim1 <- mvtnorm::rmvnorm(2, coef(GLMex1), 
                         vcov(GLMex1))
sim0 <- mvtnorm::rmvnorm(1, coef(GLMex0), 
                         vcov(GLMex0))
GLMsim.[, rmdl==1] <- tcrossprod(newMat, sim1)
GLMsim.[, rmdl==2] <- tcrossprod(
          newMat[, 1, drop=FALSE], sim0)
                      

all.equal(bicGLMsin[[2]], data.frame(GLMsim.), 
    tolerance=4*sqrt(.Machine$double.eps))
# tcrossprod numeric precision is mediocre 
# for the constant model in this example.  


[Package Ecfun version 0.3-2 Index]