R: Data cube creation (dcc)

dcc {distrr}

R Documentation

Data cube creation (dcc)

Description

Data cube creation (dcc)

Usage

dcc(.data, .variables, .fun = jointfun_, ...)

dcc2(.data, .variables, .fun = jointfun_, order_type = extract_unique2, ...)

dcc5(
  .data,
  .variables,
  .fun = jointfun_,
  .total = "Totale",
  order_type = extract_unique4,
  .all = TRUE,
  ...
)

Arguments

`.data`	data frame to be processed
`.variables`	variables to split data frame by, as a character vector (`c("var1", "var2")`).
`.fun`	function to apply to each piece (default: `jointfun_`)
`...`	additional functions passed to `.fun`.
`order_type`	a function like `extract_unique` or `extract_unique2`.
`.total`	character string with the name to give to the subset of data that includes all the observations of a variable (default: `"Totale"`).
`.all`	logical, indicating if functions' have to be evaluated on the complete dataset.

Value

a data cube, with a column for each cateogorical variable used, and a row for each combination of all the categorical variables' modalities. In addition to all the modalities, each variable will also have a "Total" possibility, which includes all the others. The data cube will contain marginal, conditional and joint empirical distributions...

Examples

data("invented_wages")
str(invented_wages)
tmp <- dcc(.data = invented_wages, 
           .variables = c("gender", "sector"), .fun = jointfun_)
tmp
str(tmp)
tmp2 <- dcc2(.data = invented_wages, 
            .variables = c("gender", "education"), 
            .fun = jointfun_, 
            order_type = extract_unique2)
tmp2
str(tmp2)

# dcc5 works like dcc2, but has an additional optional argument, .total,
# that can be added to give a name to the groups that include all the 
# observations of a variable.
tmp5 <- dcc5(.data = invented_wages, 
            .variables = c("gender", "education"),
            .fun = jointfun_,
            .total = "TOTAL",
            order_type = extract_unique2)
tmp5

[Package distrr version 0.0.6 Index]