model_lineup {regressinator} | R Documentation |
Produce a lineup for a fitted model
Description
A lineup hides diagnostics among "null" diagnostics, i.e. the same
diagnostics calculated using models fit to data where all model assumptions
are correct. For each null diagnostic, model_lineup()
simulates new
responses from the model using the fitted covariate values and the model's
error distribution, link function, and so on. Hence the new response values
are generated under ideal conditions: the fitted model is true and all
assumptions hold. decrypt()
reveals which diagnostics are the true
diagnostics.
Usage
model_lineup(fit, fn = augment, nsim = 20, ...)
Arguments
fit |
A model fit to data, such as by |
fn |
A diagnostic function. The function's first argument should be the
fitted model, and it must return a data frame. Defaults to
|
nsim |
Number of total diagnostics. For example, if |
... |
Additional arguments passed to |
Details
To generate different kinds of diagnostics, the user can provide a custom
fn
. The fn
should take a model fit as its argument and return a data
frame. For instance, the data frame might contain one row per observation and
include the residuals and fitted values for each observation; or it might be
a single row containing a summary statistic or test statistic.
fn
will be called on the original fit
provided. Then
parametric_boot_distribution()
will be used to simulate data from the model
fit nsim - 1
times, refit the model to each simulated dataset, and run fn
on each refit model. The null distribution is conditional on X, i.e. the
covariates used will be identical, and only the response values will be
simulated. The data frames are concatenated with an additional .sample
column identifying which fit each row came from.
When called, this function will print a message such as
decrypt("sD0f gCdC En JP2EdEPn ZY")
. This is how to get the location of the
true diagnostics among the null diagnostics: evaluating this in the R console
will produce a string such as "True data in position 5"
.
Value
A data frame (tibble) with columns corresponding to the columns
returned by fn
. The additional column .sample
indicates which set of
diagnostics each row is from. For instance, if the true data is in position
5, selecting rows with .sample == 5
will retrieve the diagnostics from
the original model fit.
Model limitations
Because this function uses S3 generic methods such as model.frame()
,
simulate()
, and update()
, it can be used with any model fit for which
methods are provided. In base R, this includes lm()
and glm()
.
The model provided as fit
must be fit using the data
argument to provide
a data frame. For example:
fit <- lm(dist ~ speed, data = cars)
When simulating new data, this function provides the simulated data as the
data
argument and re-fits the model. If you instead refer directly to local
variables in the model formula, this will not work. For example, if you fit a
model this way:
# will not work fit <- lm(cars$dist ~ cars$speed)
It will not be possible to refit the model using simulated datasets, as that
would require modifying your environment to edit cars
.
References
Buja et al. (2009). Statistical inference for exploratory data analysis and model diagnostics. Philosophical Transactions of the Royal Society A, 367 (1906), pp. 4361-4383. doi:10.1098/rsta.2009.0120
Wickham et al. (2010). Graphical inference for infovis. IEEE Transactions on Visualization and Computer Graphics, 16 (6), pp. 973-979. doi:10.1109/TVCG.2010.161
See Also
parametric_boot_distribution()
to simulate draws by using the
fitted model to draw new response values; sampling_distribution()
to
simulate draws from the population distribution, rather than from the model
Examples
fit <- lm(dist ~ speed, data = cars)
model_lineup(fit, nsim = 5)
resids_vs_speed <- function(f) {
data.frame(resid = residuals(f),
speed = model.frame(f)$speed)
}
model_lineup(fit, fn = resids_vs_speed, nsim = 5)