R: Data Information by Group

agg_dfm {quest}

R Documentation

Data Information by Group

Description

agg_dfm evaluates a function on a set of variables in a data.frame separately for each group and combines the results back together. The rep and rtn.grp arguments determine exactly how the results are combined together. If rep = TRUE, then the result of fun is repeated for every row of the group in data[grp.nm]; If rep = FALSE, then the result of fun for each unique combination of data[grp.nm] is returned once. If rtn.grp = TRUE, then the results are returned in a data.frame where the first columns are the groups from data[grp.nm]; If rtn.grp = FALSE, then the results are returned in an atomic vector. Note, agg_dfm evaluates fun on all the variables in data[vrb.nm] as a whole, If instead, you want to evaluate fun separately for variable vrb.nm in data, then use Agg.

Usage

agg_dfm(
  data,
  vrb.nm,
  grp.nm,
  rep = FALSE,
  rtn.grp = !rep,
  sep = ".",
  rtn.result.nm = "result",
  fun,
  ...
)

Arguments

`data`	data.frame of data.
`vrb.nm`	character vector of colnames from `data` specifying the set of variables to evaluate `fun` on.
`grp.nm`	character vector of colnames from `data` specifying the groups.
`rep`	logical vector of length 1 specifying whether the result of `fun` should be repeated for every instance of the group in `data[vrb.nm]` (TRUE) or only once for each group (FALSE).
`rtn.grp`	logical vector of length 1 specifying whether the group columns (i.e., `data[grp.nm]`) should be included in the return object as columns. The default is the opposite of `rep` as traditionally it is most important to return the group columns when `rep` = FALSE.
`sep`	character vector of length 1 specifying the string to paste the group values together with when there are multiple grouping variables (i.e., `length(grp.nm) > 1`). Only used if `rep` = FALSE and `rtn.grp` = FALSE.
`rtn.result.nm`	character vector of length 1 specifying the name for the column of results in the return object. Only used if `rtn.grp` = TRUE.
`fun`	function to evaluate each grouping of `data[vrb.nm]` by. This function must return an atomic vector of length 1. If not, then consider using `by2` or `plyr::dlply`.
`...`	additional named arguments to `fun`.

Details

If rep = TRUE, then agg_dfm calls ave_dfm; if rep = FALSE, then agg_dfm calls by. When rep = FALSE and rtn.grp = TRUE, agg_dfm is very similar to plyr::ddply; when rep = FALSE and rtn.grp = FALSE, then agg_dfm is very similar to plyr::daply.

Value

result of fun applied to each grouping of data[vrb.nm]. The structure of the return object depends on the arguments rep and rtn.grp.

If rep = TRUE and rtn.grp = TRUE:: then the return object is a data.frame with nrow = nrow(data) where the first columns are data[grp.nm] and the last column is the result of fun with colname = rtn.result.nm.
If rep = TRUE and rtn.grp = FALSE:: then the return object is an atomic vector with length = nrow(data) where the values are the result of fun and the names = row.names(data).
If rep = FALSE and rtn.grp = TRUE:: then the return object is a data.frame with nrow = length(levels(interaction(data[grp.nm]))) where the first columns are the unique group combinations in data[grp.nm] and the last column is the result of fun with colname = rtn.result.nm.
If rep = FALSE and rtn.grp = FALSE:: then the return object is an atomic vector with length length(levels(interaction(data[grp.nm]))) where the values are the result of fun and the names are each group value pasted together by sep if there are multiple grouping variables (i.e., length(grp.nm) > 2).

Examples


### one grouping variable

## by in base R
by(data = airquality[c("Ozone","Solar.R")], INDICES = airquality["Month"],
   simplify = FALSE, FUN = function(dat) cor(dat, use = "complete")[1,2])

## rep = TRUE

# rtn.group = TRUE
agg_dfm(data = airquality, vrb.nm = c("Ozone","Solar.R"), grp.nm = "Month",
   rep = TRUE, rtn.grp = TRUE, fun = function(dat) cor(dat, use = "complete")[1,2])

# rtn.group = FALSE
agg_dfm(data = airquality, vrb.nm = c("Ozone","Solar.R"), grp.nm = "Month",
   rep = TRUE, rtn.grp = FALSE, fun = function(dat) cor(dat, use = "complete")[1,2])

## rep = FALSE

# rtn.group = TRUE
agg_dfm(data = airquality, vrb.nm = c("Ozone","Solar.R"), grp.nm = "Month",
   rep = FALSE, rtn.grp = TRUE, fun = function(dat) cor(dat, use = "complete")[1,2])
suppressWarnings(plyr::ddply(.data = airquality[c("Ozone","Solar.R","Month")],
   .variables = "Month", .fun = function(dat) cor(dat, use = "complete")[1,2]))

# rtn.group = FALSE
agg_dfm(data = airquality, vrb.nm = c("Ozone","Solar.R"), grp.nm = "Month",
   rep = FALSE, rtn.grp = FALSE, fun = function(dat) cor(dat, use = "complete")[1,2])
suppressWarnings(plyr::daply(.data = airquality[c("Ozone","Solar.R","Month")],
   .variables = "Month", .fun = function(dat) cor(dat, use = "complete")[1,2]))

### two grouping variables

## by in base R
by(data = mtcars[c("mpg","cyl","disp")], INDICES = mtcars[c("vs","am")],
   FUN = nrow, simplify = FALSE) # with multiple group columns

## rep = TRUE

# rtn.grp = TRUE
agg_dfm(data = mtcars, vrb.nm = c("mpg","cyl","disp"), grp.nm = c("vs","am"),
   rep = TRUE, rtn.grp = TRUE, fun = nrow)

# rtn.grp = FALSE
agg_dfm(data = mtcars, vrb.nm = c("mpg","cyl","disp"), grp.nm = c("vs","am"),
   rep = TRUE, rtn.grp = FALSE, fun = nrow)

## rep = FALSE

# rtn.grp = TRUE
agg_dfm(data = mtcars, vrb.nm = c("mpg","cyl","disp"), grp.nm = c("vs","am"),
   rep = FALSE, rtn.grp = TRUE, fun = nrow)
agg_dfm(data = mtcars, vrb.nm = c("mpg","cyl","disp"), grp.nm = c("vs","am"),
   rep = FALSE, rtn.grp = TRUE, rtn.result.nm = "value", fun = nrow)

# rtn.grp = FALSE
agg_dfm(data = mtcars, vrb.nm = c("mpg","cyl","disp"), grp.nm = c("vs","am"),
   rep = FALSE, rtn.grp = FALSE, fun = nrow)
agg_dfm(data = mtcars, vrb.nm = c("mpg","cyl","disp"), grp.nm = c("vs","am"),
   rep = FALSE, rtn.grp = FALSE, sep = "_", fun = nrow)

[Package quest version 0.2.0 Index]