std {sjmisc} | R Documentation |
Standardize and center variables
Description
std()
computes a z-transformation (standardized and centered)
on the input. center()
centers the input. std_if()
and
center_if()
are scoped variants of std()
and center()
,
where transformation will be applied only to those variables that match the
logical condition of predicate
.
Usage
std(
x,
...,
robust = c("sd", "2sd", "gmd", "mad"),
include.fac = FALSE,
append = TRUE,
suffix = "_z"
)
std_if(
x,
predicate,
robust = c("sd", "2sd", "gmd", "mad"),
include.fac = FALSE,
append = TRUE,
suffix = "_z"
)
center(x, ..., include.fac = FALSE, append = TRUE, suffix = "_c")
center_if(x, predicate, include.fac = FALSE, append = TRUE, suffix = "_c")
Arguments
x |
A vector or data frame. |
... |
Optional, unquoted names of variables that should be selected for
further processing. Required, if |
robust |
Character vector, indicating the method applied when
standardizing variables with |
include.fac |
Logical, if |
append |
Logical, if |
suffix |
Indicates which suffix will be added to each dummy variable.
Use |
predicate |
A predicate function to be applied to the columns. The
variables for which |
Details
std()
and center()
also work on grouped data frames
(see group_by
). In this case, standardization
or centering is applied to the subsets of variables in x
.
See 'Examples'.
For more complicated models with many predictors, Gelman and Hill (2007)
suggest leaving binary inputs as is and only standardize continuous predictors
by dividing by two standard deviations. This ensures a rough comparability
in the coefficients.
Value
If x
is a vector, returns a vector with standardized or
centered variables. If x
is a data frame, for append = TRUE
,
x
including the transformed variables as new columns is returned;
if append = FALSE
, only the transformed variables will be returned.
If append = TRUE
and suffix = ""
, recoded variables will
replace (overwrite) existing variables.
Note
std()
and center()
only return a vector, if x
is
a vector. If x
is a data frame and only one variable is specified
in the ...
-ellipses argument, both functions do return a
data frame (see 'Examples').
References
Gelman A (2008) Scaling regression inputs by dividing by two
standard deviations. Statistics in Medicine 27: 2865-2873.
http://www.stat.columbia.edu/~gelman/research/published/standardizing7.pdf
Gelman A, Hill J (2007) Data Analysis Using Regression and Multilevel/Hierarchical
Models. Cambdridge, Cambdrige University Press: 55-57
Examples
data(efc)
std(efc$c160age) %>% head()
std(efc, e17age, c160age, append = FALSE) %>% head()
center(efc$c160age) %>% head()
center(efc, e17age, c160age, append = FALSE) %>% head()
# NOTE!
std(efc$e17age) # returns a vector
std(efc, e17age) # returns a data frame
# with quasi-quotation
x <- "e17age"
center(efc, !!x, append = FALSE) %>% head()
# works with mutate()
library(dplyr)
efc %>%
select(e17age, neg_c_7) %>%
mutate(age_std = std(e17age), burden = center(neg_c_7)) %>%
head()
# works also with grouped data frames
mtcars %>% std(disp)
# compare new column "disp_z" w/ output above
mtcars %>%
group_by(cyl) %>%
std(disp)
data(iris)
# also standardize factors
std(iris, include.fac = TRUE, append = FALSE)
# don't standardize factors
std(iris, include.fac = FALSE, append = FALSE)
# standardize only variables with more than 10 unique values
p <- function(x) dplyr::n_distinct(x) > 10
std_if(efc, predicate = p, append = FALSE)