standardize_parameters {parameters} | R Documentation |
Parameters standardization
Description
Compute standardized model parameters (coefficients).
Usage
standardize_parameters(
model,
method = "refit",
ci = 0.95,
robust = FALSE,
two_sd = FALSE,
include_response = TRUE,
verbose = TRUE,
...
)
standardize_posteriors(
model,
method = "refit",
robust = FALSE,
two_sd = FALSE,
include_response = TRUE,
verbose = TRUE,
...
)
Arguments
model |
A statistical model. |
method |
The method used for standardizing the parameters. Can be
|
ci |
Confidence Interval (CI) level |
robust |
Logical, if |
two_sd |
If |
include_response |
If |
verbose |
Toggle warnings and messages on or off. |
... |
For
|
Details
Standardization Methods
-
refit: This method is based on a complete model re-fit with a standardized version of the data. Hence, this method is equal to standardizing the variables before fitting the model. It is the "purest" and the most accurate (Neter et al., 1989), but it is also the most computationally costly and long (especially for heavy models such as Bayesian models). This method is particularly recommended for complex models that include interactions or transformations (e.g., polynomial or spline terms). The
robust
(default toFALSE
) argument enables a robust standardization of data, i.e., based on themedian
andMAD
instead of themean
andSD
. Seedatawizard::standardize()
for more details.-
Note that
standardize_parameters(method = "refit")
may not return the same results as fitting a model on data that has been standardized withstandardize()
;standardize_parameters()
used the data used by the model fitting function, which might not be same data if there are missing values. see theremove_na
argument instandardize()
.
-
-
posthoc: Post-hoc standardization of the parameters, aiming at emulating the results obtained by "refit" without refitting the model. The coefficients are divided by the standard deviation (or MAD if
robust
) of the outcome (which becomes their expression 'unit'). Then, the coefficients related to numeric variables are additionally multiplied by the standard deviation (or MAD ifrobust
) of the related terms, so that they correspond to changes of 1 SD of the predictor (e.g., "A change in 1 SD ofx
is related to a change of 0.24 of the SD ofy
). This does not apply to binary variables or factors, so the coefficients are still related to changes in levels. This method is not accurate and tend to give aberrant results when interactions are specified. -
basic: This method is similar to
method = "posthoc"
, but treats all variables as continuous: it also scales the coefficient by the standard deviation of model's matrix' parameter of factors levels (transformed to integers) or binary predictors. Although being inappropriate for these cases, this method is the one implemented by default in other software packages, such aslm.beta::lm.beta()
. -
smart (Standardization of Model's parameters with Adjustment, Reconnaissance and Transformation - experimental): Similar to
method = "posthoc"
in that it does not involve model refitting. The difference is that the SD (or MAD ifrobust
) of the response is computed on the relevant section of the data. For instance, if a factor with 3 levels A (the intercept), B and C is entered as a predictor, the effect corresponding to B vs. A will be scaled by the variance of the response at the intercept only. As a results, the coefficients for effects of factors are similar to a Glass' delta. -
pseudo (for 2-level (G)LMMs only): In this (post-hoc) method, the response and the predictor are standardized based on the level of prediction (levels are detected with
performance::check_heterogeneity_bias()
): Predictors are standardized based on their SD at level of prediction (see alsodatawizard::demean()
); The outcome (in linear LMMs) is standardized based on a fitted random-intercept-model, wheresqrt(random-intercept-variance)
is used for level 2 predictors, andsqrt(residual-variance)
is used for level 1 predictors (Hoffman 2015, page 342). A warning is given when a within-group variable is found to have access between-group variance. -
sdy (for logistic regression models only): This y-standardization is useful when comparing coefficients of logistic regression models across models for the same sample. Unobserved heterogeneity varies across models with different independent variables, and thus, odds ratios from the same predictor of different models cannot be compared directly. The y-standardization makes coefficients "comparable across models by dividing them with the estimated standard deviation of the latent variable for each model" (Mood 2010). Thus, whenever one has multiple logistic regression models that are fit to the same data and share certain predictors (e.g. nested models), it can be useful to use this standardization approach to make log-odds or odds ratios comparable.
Transformed Variables
When the model's formula contains transformations (e.g. y ~ exp(X)
) method = "refit"
will give different results compared to method = "basic"
("posthoc"
and "smart"
do not support such transformations): While
"refit"
standardizes the data prior to the transformation (e.g.
equivalent to exp(scale(X))
), the "basic"
method standardizes the
transformed data (e.g. equivalent to scale(exp(X))
).
See the Transformed Variables section in datawizard::standardize.default()
for more details on how different transformations are dealt with when
method = "refit"
.
Confidence Intervals
The returned confidence intervals are re-scaled versions of the unstandardized confidence intervals, and not "true" confidence intervals of the standardized coefficients (cf. Jones & Waller, 2015).
Generalized Linear Models
Standardization for generalized linear models (GLM, GLMM, etc) is done only with respect to the predictors (while the outcome remains as-is, unstandardized) - maintaining the interpretability of the coefficients (e.g., in a binomial model: the exponent of the standardized parameter is the OR of a change of 1 SD in the predictor, etc.)
Dealing with Factors
standardize(model)
or standardize_parameters(model, method = "refit")
do
not standardize categorical predictors (i.e. factors) / their
dummy-variables, which may be a different behaviour compared to other R
packages (such as lm.beta) or other software packages (like SPSS). To
mimic such behaviours, either use standardize_parameters(model, method = "basic")
to obtain post-hoc standardized parameters, or standardize the data
with datawizard::standardize(data, force = TRUE)
before fitting the
model.
Value
A data frame with the standardized parameters (Std_*
, depending on
the model type) and their CIs (CI_low
and CI_high
). Where applicable,
standard errors (SEs) are returned as an attribute (attr(x, "standard_error")
).
References
Hoffman, L. (2015). Longitudinal analysis: Modeling within-person fluctuation and change. Routledge.
Jones, J. A., & Waller, N. G. (2015). The normal-theory and asymptotic distribution-free (ADF) covariance matrix of standardized regression coefficients: theoretical extensions and finite sample behavior. Psychometrika, 80(2), 365-378.
Neter, J., Wasserman, W., & Kutner, M. H. (1989). Applied linear regression models.
Gelman, A. (2008). Scaling regression inputs by dividing by two standard deviations. Statistics in medicine, 27(15), 2865-2873.
Mood C. Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It. European Sociological Review (2010) 26:67–82.
See Also
See also package vignette.
Other standardize:
standardize_info()
Examples
model <- lm(len ~ supp * dose, data = ToothGrowth)
standardize_parameters(model, method = "refit")
standardize_parameters(model, method = "posthoc")
standardize_parameters(model, method = "smart")
standardize_parameters(model, method = "basic")
# Robust and 2 SD
standardize_parameters(model, robust = TRUE)
standardize_parameters(model, two_sd = TRUE)
model <- glm(am ~ cyl * mpg, data = mtcars, family = "binomial")
standardize_parameters(model, method = "refit")
standardize_parameters(model, method = "posthoc")
standardize_parameters(model, method = "basic", exponentiate = TRUE)
m <- lme4::lmer(mpg ~ cyl + am + vs + (1 | cyl), mtcars)
standardize_parameters(m, method = "pseudo", ci_method = "satterthwaite")
model <- rstanarm::stan_glm(rating ~ critical + privileges, data = attitude, refresh = 0)
standardize_posteriors(model, method = "refit", verbose = FALSE)
standardize_posteriors(model, method = "posthoc", verbose = FALSE)
standardize_posteriors(model, method = "smart", verbose = FALSE)
head(standardize_posteriors(model, method = "basic", verbose = FALSE))