gmc {rockchalk} | R Documentation |
Group Mean Center: Generate group summaries and individual deviations within groups
Description
Multilevel modelers often need to include predictors like the within-group mean and the deviations of individuals around the mean. This function makes it easy (almost foolproof) to calculate those variables.
Usage
gmc(dframe, x, by, FUN = mean, suffix = c("_mn", "_dev"), fulldataframe = TRUE)
Arguments
dframe |
a data frame. |
x |
Variable names or a vector of variable names. Do NOT supply a variable like dat$x1, do supply a quoted variable name "x1" or a vector c("x1", "x2") |
by |
A grouping variable name or a vector of grouping names. Do NOT supply a variable like dat$xfactor, do supply a name "xfactor", or a vector c("xfac1", "xfac2"). |
FUN |
Defaults to the mean, have not tested alternatives |
suffix |
The suffixes to be added to column 1 and column 2 |
fulldataframe |
Default TRUE. original data frame is returned with new columna added (which I would call "Stata style"). If FALSE, this will return only newly created columns, the variables with suffix[1] and suffix[2] appended to names. TRUE is easier (maybe safer), but also wastes memory. |
Details
This was originally just for "group mean-centered" data, but now is more general, can accept functions like median to calculate center and then deviations about that center value within the group.
Similar to Stata egen, except more versatile and fun! Will create 2 new columns for each variable, with suffixes for the summary and deviations (default suffixes are "_mn" and "_dev". Rows will match the rows of the original data frame, so it will be easy to merge or cbind them back together.
Value
Depending on fulldataframe
, either a new data frame
with center and deviation columns, or or original data frame
with "x_mn" and "x_dev" variables appended (Stata style).
Author(s)
Paul Johnson
Examples
## Make a data frame out of the state data collection (see ?state)
data(state)
statenew <- as.data.frame(state.x77)
statenew$region <- state.region
statenew$state <- rownames(statenew)
head(statenew.gmc1 <- gmc(statenew, c("Income", "Population"), by = "region"))
head(statenew.gmc2 <- gmc(statenew, c("Income", "Population"), by = "region",
fulldataframe = FALSE))
## Note dangerous step: assumes row alignment is correct.
## return has rownames from original set to identify danger
head(statenew2 <- cbind(statenew, statenew.gmc2))
if(!all.equal(rownames(statenew), rownames(statenew.gmc2))){
warning("Data row-alignment probable error")
}
## The following box plots should be identical
boxplot(Income ~ region, statenew.gmc1)
boxplot((Income_mn + Income_dev) ~ region, statenew.gmc1)
## Multiple by variables
fakedat <- data.frame(i = 1:200, j = gl(4, 50), k = gl(20, 10),
y1 = rnorm(200), y2 = rnorm(200))
head(gmc(fakedat, "y1", by = "k"), 20)
head(gmc(fakedat, "y1", by = c("j", "k"), fulldataframe = FALSE), 40)
head(gmc(fakedat, c("y1", "y2"), by = c("j", "k"), fulldataframe = FALSE))
## Check missing value management
fakedat[2, "k"] <- NA
fakedat[4, "j"] <- NA##' head(gmc(fakedat, "y1", by = "k"), 20)
head(gmc(fakedat, "y1", by = c("j", "k"), fulldataframe = FALSE), 40)