R: Simulate the distribution of estimates by parametric...

parametric_boot_distribution {regressinator}

R Documentation

Simulate the distribution of estimates by parametric bootstrap

Description

Repeatedly simulates new response values by using the fitted model, holding the covariates fixed. By default, refits the same model to each simulated dataset, but an alternative model can be provided. Estimates, confidence intervals, or other quantities are extracted from each fitted model and returned as a tidy data frame.

Usage

parametric_boot_distribution(
  fit,
  alternative_fit = fit,
  data = model.frame(fit),
  fn = tidy,
  nsim = 100,
  ...
)

Arguments

`fit`	A model fit to data, such as by `lm()` or `glm()`, to simulate new response values from.
`alternative_fit`	A model fit to data, to refit to the data sampled from `fit`. Defaults to `fit`, but an alternative model can be provided to examine its behavior when `fit` is the true model.
`data`	Data frame to be used in the simulation. Must contain the predictors needed for both `fit` and `alternative_fit`. Defaults to the predictors used in `fit`.
`fn`	Function to call on each new model fit to produce a data frame of estimates. Defaults to `broom::tidy()`, which produces a tidy data frame of coefficients, estimates, standard errors, and hypothesis tests.
`nsim`	Number of total simulations to run.
`...`	Additional arguments passed to `fn` each time it is called.

Details

The default behavior samples from a model and refits the same model to the sampled data; this is useful when, for example, exploring how model diagnostics look when the model is well-specified. Another common use of the parametric bootstrap is hypothesis testing, where we might simulate from a null model and fit an alternative model to the data, to obtain the null distribution of a particular estimate or statistic. Provide alternative_fit to have a specific model fit to each simulated dataset, rather than the model they are simulated from.

Only the response variable from the fit (or alternative_fit, if given) is redrawn; other response variables in the population are left unchanged from their values in data.

Value

A data frame (tibble) with columns corresponding to the columns returned by fn. The additional column .sample indicates which fit each row is from.

Model limitations

Because this function uses S3 generic methods such as model.frame(), simulate(), and update(), it can be used with any model fit for which methods are provided. In base R, this includes lm() and glm().

The model provided as fit must be fit using the data argument to provide a data frame. For example:

fit <- lm(dist ~ speed, data = cars)

When simulating new data, this function provides the simulated data as the data argument and re-fits the model. If you instead refer directly to local variables in the model formula, this will not work. For example, if you fit a model this way:

# will not work
fit <- lm(cars$dist ~ cars$speed)

It will not be possible to refit the model using simulated datasets, as that would require modifying your environment to edit cars.

Examples

# Bootstrap distribution of estimates:
fit <- lm(mpg ~ hp, data = mtcars)
parametric_boot_distribution(fit, nsim = 5)

# Bootstrap distribution of estimates for a quadratic model, when true
# relationship is linear:
quad_fit <- lm(mpg ~ poly(hp, 2), data = mtcars)
parametric_boot_distribution(fit, quad_fit, nsim = 5)

# Bootstrap distribution of estimates for a model with an additional
# predictor, when it's truly zero. data argument must be provided so
# alternative fit has all predictors available, not just hp:
alt_fit <- lm(mpg ~ hp + wt, data = mtcars)
parametric_boot_distribution(fit, alt_fit, data = mtcars, nsim = 5)

[Package regressinator version 0.1.3 Index]