model_aggregate {SSBtools} | R Documentation |
Hierarchical aggregation via model specification
Description
Internally a dummy/model matrix is created according to the model specification.
This model matrix is used in the aggregation process via matrix multiplication and/or the function aggregate_multiple_fun
.
Usage
model_aggregate(
data,
sum_vars = NULL,
fun_vars = NULL,
fun = NULL,
hierarchies = NULL,
formula = NULL,
dim_var = NULL,
remove_empty = NULL,
preagg_var = NULL,
dummy = TRUE,
pre_aggregate = dummy,
list_return = FALSE,
pre_return = FALSE,
verbose = TRUE,
mm_args = NULL,
...
)
Arguments
data |
A data frame containing data to be aggregated |
sum_vars |
Variables to be summed. This will be done via matrix multiplication. |
fun_vars |
Variables to be aggregated by supplied functions.
This will be done via |
fun |
The |
hierarchies |
The |
formula |
The |
dim_var |
The |
remove_empty |
When non-NULL, the |
preagg_var |
Extra variables to be used as grouping elements in the pre-aggregate step |
dummy |
The |
pre_aggregate |
Whether to pre-aggregate data to reduce the dimension of the model matrix.
Note that all original |
list_return |
Whether to return a list of separate components including the model matrix |
pre_return |
Whether to return the pre-aggregate data as a two-component list. Can also be combined with |
verbose |
Whether to print information during calculations. |
mm_args |
List of further arguments passed to |
... |
Further arguments passed to |
Details
With formula input, limited output can be achieved by formula_selection
(see example).
An attribute called startCol
has been added to the output data frame to make this functionality work.
Value
A data frame or a list.
Examples
z <- SSBtoolsData("sprt_emp_withEU")
z$age[z$age == "Y15-29"] <- "young"
z$age[z$age == "Y30-64"] <- "old"
names(z)[names(z) == "ths_per"] <- "ths"
z$y <- 1:18
my_range <- function(x) c(min = min(x), max = max(x))
out <- model_aggregate(z,
formula = ~age:year + geo,
sum_vars = c("y", "ths"),
fun_vars = c(sum = "ths", mean = "y", med = "y", ra = "ths"),
fun = c(sum = sum, mean = mean, med = median, ra = my_range))
out
# Limited output can be achieved by formula_selection
formula_selection(out, ~geo)
# Using the single unnamed variable feature.
model_aggregate(z, formula = ~age, fun_vars = "y",
fun = c(sum = sum, mean = mean, med = median, n = length))
# To illustrate list_return and pre_return
for (pre_return in c(FALSE, TRUE)) for (list_return in c(FALSE, TRUE)) {
cat("\n=======================================\n")
cat("list_return =", list_return, ", pre_return =", pre_return, "\n\n")
out <- model_aggregate(z, formula = ~age:year,
sum_vars = c("ths", "y"),
fun_vars = c(mean = "y", ra = "y"),
fun = c(mean = mean, ra = my_range),
list_return = list_return,
pre_return = pre_return)
cat("\n")
print(out)
}
# To illustrate preagg_var
model_aggregate(z, formula = ~age:year,
sum_vars = c("ths", "y"),
fun_vars = c(mean = "y", ra = "y"),
fun = c(mean = mean, ra = my_range),
preagg_var = "eu",
pre_return = TRUE)[["pre_data"]]
# To illustrate hierarchies
geo_hier <- SSBtoolsData("sprt_emp_geoHier")
model_aggregate(z, hierarchies = list(age = "All", geo = geo_hier),
sum_vars = "y",
fun_vars = c(sum = "y"))
#### Special non-dummy cases illustrated below ####
# Extend the hierarchy to make non-dummy model matrix
geo_hier2 <- rbind(data.frame(mapsFrom = c("EU", "Spain"),
mapsTo = "EUandSpain", sign = 1), geo_hier[, -4])
# Warning since non-dummy
# y and y_sum are different
model_aggregate(z, hierarchies = list(age = "All", geo = geo_hier2),
sum_vars = "y",
fun_vars = c(sum = "y"))
# No warning since dummy since unionComplement = TRUE (see ?HierarchyCompute)
# y and y_sum are equal
model_aggregate(z, hierarchies = list(age = "All", geo = geo_hier2),
sum_vars = "y",
fun_vars = c(sum = "y"),
mm_args = list(unionComplement = TRUE))
# Non-dummy again, but no warning since dummy = FALSE
# Then pre_aggregate is by default set to FALSE (error when TRUE)
# fun with extra argument needed (see ?dummy_aggregate)
# y and y_sum2 are equal
model_aggregate(z, hierarchies = list(age = "All", geo = geo_hier2),
sum_vars = "y",
fun_vars = c(sum2 = "y"),
fun = c(sum2 = function(x, y) sum(x * y)),
dummy = FALSE)