HierarchyCompute {SSBtools} | R Documentation |
Hierarchical Computations
Description
This function computes aggregates by crossing several hierarchical specifications and factorial variables.
Usage
HierarchyCompute(
data,
hierarchies,
valueVar,
colVar = NULL,
rowSelect = NULL,
colSelect = NULL,
select = NULL,
inputInOutput = FALSE,
output = "data.frame",
autoLevel = TRUE,
unionComplement = FALSE,
constantsInOutput = NULL,
hierarchyVarNames = c(mapsFrom = "mapsFrom", mapsTo = "mapsTo", sign = "sign", level =
"level"),
selectionByMultiplicationLimit = 10^7,
colNotInDataWarning = TRUE,
useMatrixToDataFrame = TRUE,
handleDuplicated = "sum",
asInput = FALSE,
verbose = FALSE,
reOrder = FALSE,
reduceData = TRUE,
makeRownames = NULL
)
Arguments
data |
The input data frame |
hierarchies |
A named (names in |
valueVar |
Name of the variable(s) to be aggregated. |
colVar |
When non-NULL, the function |
rowSelect |
Data frame specifying variable combinations for output. The colFactor variable is not included.
In addition |
colSelect |
Vector specifying categories of the colFactor variable for output. |
select |
Data frame specifying variable combinations for output. The colFactor variable is included. |
inputInOutput |
Logical vector (possibly recycled) for each element of hierarchies.
TRUE means that codes from input are included in output. Values corresponding to |
output |
One of "data.frame" (default), "dummyHierarchies", "outputMatrix", "dataDummyHierarchy", "valueMatrix", "fromCrossCode",
"toCrossCode", "crossCode" (as toCrossCode), "outputMatrixWithCrossCode", "matrixComponents",
"dataDummyHierarchyWithCodeFrame", "dataDummyHierarchyQuick".
The latter two do not require |
autoLevel |
Logical vector (possibly recycled) for each element of hierarchies.
When TRUE, level is computed by automatic method as in |
unionComplement |
Logical vector (possibly recycled) for each element of hierarchies.
When TRUE, sign means union and complement instead of addition or subtraction as in |
constantsInOutput |
A single row data frame to be combine by the other output. |
hierarchyVarNames |
Variable names in the hierarchy tables as in |
selectionByMultiplicationLimit |
With non-NULL |
colNotInDataWarning |
When TRUE, warning produced when elements of |
useMatrixToDataFrame |
When TRUE (default) special functionality for saving time and memory is used. |
handleDuplicated |
Handling of duplicated code rows in data. One of: "sum" (default), "sumByAggregate", "sumWithWarning", "stop" (error), "single" or "singleWithWarning". With no colFactor sum and sumByAggregate/sumWithWarning are different (original values or aggregates in "valueMatrix"). When single, only one of the values is used (by matrix subsetting). |
asInput |
When TRUE (FALSE is default) output matrices match input data. Thus
|
verbose |
Whether to print information during calculations. FALSE is default. |
reOrder |
When TRUE (FALSE is default) output codes are ordered differently, more similar to a usual model matrix ordering. |
reduceData |
When TRUE (default) unnecessary (for the aggregated result) rows of |
makeRownames |
When TRUE |
Details
A key element of this function is the matrix multiplication:
outputMatrix
=
dataDummyHierarchy
%*%
valueMatrix
.
The matrix, valueMatrix
is a re-organized version of the valueVar vector from input. In particular,
if a variable is selected as colFactor
, there is one column for each level of that variable.
The matrix, dataDummyHierarchy
is constructed by crossing dummy coding of hierarchies (DummyHierarchy
) and factorial variables
in a way that matches valueMatrix
. The code combinations corresponding to rows and columns of dataDummyHierarchy
can be obtained as toCrossCode
and fromCrossCode
. In the default data frame output, the outputMatrix
is stacked
to one column and combined with the code combinations of all variables.
Value
As specified by the parameter output
Author(s)
Øyvind Langsrud
See Also
Hierarchies2ModelMatrix
, AutoHierarchies
.
Examples
# Data and hierarchies used in the examples
x <- SSBtoolsData("sprt_emp") # Employment in sport in thousand persons from Eurostat database
geoHier <- SSBtoolsData("sprt_emp_geoHier")
ageHier <- SSBtoolsData("sprt_emp_ageHier")
# Two hierarchies and year as rowFactor
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "rowFactor"), "ths_per")
# Same result with year as colFactor (but columns ordered differently)
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per")
# Internally the computations are different as seen when output='matrixComponents'
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "rowFactor"), "ths_per",
output = "matrixComponents")
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
output = "matrixComponents")
# Include input age groups by setting inputInOutput = TRUE for this variable
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
inputInOutput = c(TRUE, FALSE))
# Only input age groups by switching to rowFactor
HierarchyCompute(x, list(age = "rowFactor", geo = geoHier, year = "colFactor"), "ths_per")
# Select some years (colFactor) including a year not in input data (zeros produced)
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
colSelect = c("2014", "2016", "2018"))
# Select combinations of geo and age including a code not in data or hierarchy (zeros produced)
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
rowSelect = data.frame(geo = "EU", age = c("Y0-100", "Y15-64", "Y15-29")))
# Select combinations of geo, age and year
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
select = data.frame(geo = c("EU", "Spain"), age = c("Y15-64", "Y15-29"), year = 2015))
# Extend the hierarchy table to illustrate the effect of unionComplement
# Omit level since this is handled by autoLevel
geoHier2 <- rbind(data.frame(mapsFrom = c("EU", "Spain"), mapsTo = "EUandSpain", sign = 1),
geoHier[, -4])
# Spain is counted twice
HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per")
# Can be seen in the dataDummyHierarchy matrix
HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per",
output = "matrixComponents")
# With unionComplement=TRUE Spain is not counted twice
HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per",
unionComplement = TRUE)
# With constantsInOutput
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
constantsInOutput = data.frame(c1 = "AB", c2 = "CD"))
# More that one valueVar
x$y <- 10*x$ths_per
HierarchyCompute(x, list(age = ageHier, geo = geoHier), c("y", "ths_per"))