sampling_distribution {regressinator} | R Documentation |
Simulate the sampling distribution of estimates from a population
Description
Repeatedly refits the model to new samples from the population, calculates estimates for each fit, and compiles a data frame of the results.
Usage
sampling_distribution(fit, data, fn = tidy, nsim = 100, fixed_x = TRUE, ...)
Arguments
fit |
A model fit to data, such as by |
data |
Data drawn from a |
fn |
Function to call on each new model fit to produce a data frame of
estimates. Defaults to |
nsim |
Number of simulations to run. |
fixed_x |
If |
... |
Additional arguments passed to |
Details
To generate sampling distributions of different quantities, the user can
provide a custom fn
. The fn
should take a model fit as its argument and
return a data frame. For instance, the data frame might contain one row per
estimated coefficient and include the coefficient and its standard error; or
it might contain only one row of model summary statistics.
Value
Data frame (tibble) of nsim + 1
simulation results, formed by
concatenating together the data frames returned by fn
. The .sample
column identifies which simulated sample each row came from. Rows with
.sample == 0
come from the original fit
.
Model limitations
Because this function uses S3 generic methods such as model.frame()
,
simulate()
, and update()
, it can be used with any model fit for which
methods are provided. In base R, this includes lm()
and glm()
.
The model provided as fit
must be fit using the data
argument to provide
a data frame. For example:
fit <- lm(dist ~ speed, data = cars)
When simulating new data, this function provides the simulated data as the
data
argument and re-fits the model. If you instead refer directly to local
variables in the model formula, this will not work. For example, if you fit a
model this way:
# will not work fit <- lm(cars$dist ~ cars$speed)
It will not be possible to refit the model using simulated datasets, as that
would require modifying your environment to edit cars
.
See Also
parametric_boot_distribution()
to simulate draws from a fitted
model, rather than from the population
Examples
pop <- population(
x1 = predictor("rnorm", mean = 4, sd = 10),
x2 = predictor("runif", min = 0, max = 10),
y = response(0.7 + 2.2 * x1 - 0.2 * x2, error_scale = 4.0)
)
d <- sample_x(pop, n = 20) |>
sample_y()
fit <- lm(y ~ x1 + x2, data = d)
# using the default fn = broom::tidy(). conf.int argument is passed to
# broom::tidy()
samples <- sampling_distribution(fit, d, conf.int = TRUE)
samples
suppressMessages(library(dplyr))
# the model is correctly specified, so the estimates are unbiased:
samples |>
group_by(term) |>
summarize(mean = mean(estimate),
sd = sd(estimate))
# instead of coefficients, get the sampling distribution of R^2
rsquared <- function(fit) {
data.frame(r2 = summary(fit)$r.squared)
}
sampling_distribution(fit, d, rsquared, nsim = 10)