parametric_boot_distribution {regressinator} | R Documentation |
Simulate the distribution of estimates by parametric bootstrap
Description
Repeatedly simulates new response values by using the fitted model, holding the covariates fixed. By default, refits the same model to each simulated dataset, but an alternative model can be provided. Estimates, confidence intervals, or other quantities are extracted from each fitted model and returned as a tidy data frame.
Usage
parametric_boot_distribution(
fit,
alternative_fit = fit,
data = model.frame(fit),
fn = tidy,
nsim = 100,
...
)
Arguments
fit |
A model fit to data, such as by |
alternative_fit |
A model fit to data, to refit to the data sampled from
|
data |
Data frame to be used in the simulation. Must contain the
predictors needed for both |
fn |
Function to call on each new model fit to produce a data frame of
estimates. Defaults to |
nsim |
Number of total simulations to run. |
... |
Additional arguments passed to |
Details
The default behavior samples from a model and refits the same model to the
sampled data; this is useful when, for example, exploring how model
diagnostics look when the model is well-specified. Another common use of the
parametric bootstrap is hypothesis testing, where we might simulate from a
null model and fit an alternative model to the data, to obtain the null
distribution of a particular estimate or statistic. Provide alternative_fit
to have a specific model fit to each simulated dataset, rather than the model
they are simulated from.
Only the response variable from the fit
(or alternative_fit
, if given) is
redrawn; other response variables in the population are left unchanged from
their values in data
.
Value
A data frame (tibble) with columns corresponding to the columns
returned by fn
. The additional column .sample
indicates which fit each
row is from.
Model limitations
Because this function uses S3 generic methods such as model.frame()
,
simulate()
, and update()
, it can be used with any model fit for which
methods are provided. In base R, this includes lm()
and glm()
.
The model provided as fit
must be fit using the data
argument to provide
a data frame. For example:
fit <- lm(dist ~ speed, data = cars)
When simulating new data, this function provides the simulated data as the
data
argument and re-fits the model. If you instead refer directly to local
variables in the model formula, this will not work. For example, if you fit a
model this way:
# will not work fit <- lm(cars$dist ~ cars$speed)
It will not be possible to refit the model using simulated datasets, as that
would require modifying your environment to edit cars
.
See Also
model_lineup()
to use resampling to aid in regression diagnostics;
sampling_distribution()
to simulate draws from the population
distribution, rather than the null
Examples
# Bootstrap distribution of estimates:
fit <- lm(mpg ~ hp, data = mtcars)
parametric_boot_distribution(fit, nsim = 5)
# Bootstrap distribution of estimates for a quadratic model, when true
# relationship is linear:
quad_fit <- lm(mpg ~ poly(hp, 2), data = mtcars)
parametric_boot_distribution(fit, quad_fit, nsim = 5)
# Bootstrap distribution of estimates for a model with an additional
# predictor, when it's truly zero. data argument must be provided so
# alternative fit has all predictors available, not just hp:
alt_fit <- lm(mpg ~ hp + wt, data = mtcars)
parametric_boot_distribution(fit, alt_fit, data = mtcars, nsim = 5)