R: Hierarchical Computations

HierarchyCompute {SSBtools}

R Documentation

Hierarchical Computations

Description

This function computes aggregates by crossing several hierarchical specifications and factorial variables.

Usage

HierarchyCompute(
  data,
  hierarchies,
  valueVar,
  colVar = NULL,
  rowSelect = NULL,
  colSelect = NULL,
  select = NULL,
  inputInOutput = FALSE,
  output = "data.frame",
  autoLevel = TRUE,
  unionComplement = FALSE,
  constantsInOutput = NULL,
  hierarchyVarNames = c(mapsFrom = "mapsFrom", mapsTo = "mapsTo", sign = "sign", level =
    "level"),
  selectionByMultiplicationLimit = 10^7,
  colNotInDataWarning = TRUE,
  useMatrixToDataFrame = TRUE,
  handleDuplicated = "sum",
  asInput = FALSE,
  verbose = FALSE,
  reOrder = FALSE,
  reduceData = TRUE,
  makeRownames = NULL
)

Arguments

`data`	The input data frame
`hierarchies`	A named (names in `data`) list with hierarchies. Variables can also be coded by `"rowFactor"` and `"colFactor"`.
`valueVar`	Name of the variable(s) to be aggregated.
`colVar`	When non-NULL, the function `HierarchyCompute2` is called. See its documentation for more information.
`rowSelect`	Data frame specifying variable combinations for output. The colFactor variable is not included. In addition `rowSelect="removeEmpty"` removes combinations corresponding to empty rows (only zeros) of `dataDummyHierarchy`.
`colSelect`	Vector specifying categories of the colFactor variable for output.
`select`	Data frame specifying variable combinations for output. The colFactor variable is included.
`inputInOutput`	Logical vector (possibly recycled) for each element of hierarchies. TRUE means that codes from input are included in output. Values corresponding to `"rowFactor"` and `"colFactor"` are ignored.
`output`	One of "data.frame" (default), "dummyHierarchies", "outputMatrix", "dataDummyHierarchy", "valueMatrix", "fromCrossCode", "toCrossCode", "crossCode" (as toCrossCode), "outputMatrixWithCrossCode", "matrixComponents", "dataDummyHierarchyWithCodeFrame", "dataDummyHierarchyQuick". The latter two do not require `valueVar` (`reduceData` set to `FALSE`).
`autoLevel`	Logical vector (possibly recycled) for each element of hierarchies. When TRUE, level is computed by automatic method as in `HierarchyFix`. Values corresponding to `"rowFactor"` and `"colFactor"` are ignored.
`unionComplement`	Logical vector (possibly recycled) for each element of hierarchies. When TRUE, sign means union and complement instead of addition or subtraction as in `DummyHierarchy`. Values corresponding to `"rowFactor"` and `"colFactor"` are ignored.
`constantsInOutput`	A single row data frame to be combine by the other output.
`hierarchyVarNames`	Variable names in the hierarchy tables as in `HierarchyFix`.
`selectionByMultiplicationLimit`	With non-NULL `rowSelect` and when the number of elements in `dataDummyHierarchy` exceeds this limit, the computation is performed by a slower but more memory efficient algorithm.
`colNotInDataWarning`	When TRUE, warning produced when elements of `colSelect` are not in data.
`useMatrixToDataFrame`	When TRUE (default) special functionality for saving time and memory is used.
`handleDuplicated`	Handling of duplicated code rows in data. One of: "sum" (default), "sumByAggregate", "sumWithWarning", "stop" (error), "single" or "singleWithWarning". With no colFactor sum and sumByAggregate/sumWithWarning are different (original values or aggregates in "valueMatrix"). When single, only one of the values is used (by matrix subsetting).
`asInput`	When TRUE (FALSE is default) output matrices match input data. Thus `valueMatrix` `=` `Matrix(data[, valueVar],ncol=1)`. Only possible when no colFactor.
`verbose`	Whether to print information during calculations. FALSE is default.
`reOrder`	When TRUE (FALSE is default) output codes are ordered differently, more similar to a usual model matrix ordering.
`reduceData`	When TRUE (default) unnecessary (for the aggregated result) rows of `valueMatrix` are allowed to be removed.
`makeRownames`	When TRUE `dataDummyHierarchy` contains rownames. By default, this is decided based on the parameter `output`.

Details

A key element of this function is the matrix multiplication: outputMatrix = dataDummyHierarchy %*% valueMatrix. The matrix, valueMatrix is a re-organized version of the valueVar vector from input. In particular, if a variable is selected as colFactor, there is one column for each level of that variable. The matrix, dataDummyHierarchy is constructed by crossing dummy coding of hierarchies (DummyHierarchy) and factorial variables in a way that matches valueMatrix. The code combinations corresponding to rows and columns of dataDummyHierarchy can be obtained as toCrossCode and fromCrossCode. In the default data frame output, the outputMatrix is stacked to one column and combined with the code combinations of all variables.

Value

As specified by the parameter output

Author(s)

Øyvind Langsrud

Examples

# Data and hierarchies used in the examples
x <- SSBtoolsData("sprt_emp")  # Employment in sport in thousand persons from Eurostat database
geoHier <- SSBtoolsData("sprt_emp_geoHier")
ageHier <- SSBtoolsData("sprt_emp_ageHier")

# Two hierarchies and year as rowFactor
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "rowFactor"), "ths_per")

# Same result with year as colFactor (but columns ordered differently)
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per")

# Internally the computations are different as seen when output='matrixComponents'
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "rowFactor"), "ths_per", 
                 output = "matrixComponents")
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", 
                 output = "matrixComponents")


# Include input age groups by setting inputInOutput = TRUE for this variable
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", 
                 inputInOutput = c(TRUE, FALSE))

# Only input age groups by switching to rowFactor
HierarchyCompute(x, list(age = "rowFactor", geo = geoHier, year = "colFactor"), "ths_per")

# Select some years (colFactor) including a year not in input data (zeros produced)
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", 
                 colSelect = c("2014", "2016", "2018"))

# Select combinations of geo and age including a code not in data or hierarchy (zeros produced)
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", 
                 rowSelect = data.frame(geo = "EU", age = c("Y0-100", "Y15-64", "Y15-29")))
                 
# Select combinations of geo, age and year 
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", 
     select = data.frame(geo = c("EU", "Spain"), age = c("Y15-64", "Y15-29"), year = 2015))

# Extend the hierarchy table to illustrate the effect of unionComplement 
# Omit level since this is handled by autoLevel
geoHier2 <- rbind(data.frame(mapsFrom = c("EU", "Spain"), mapsTo = "EUandSpain", sign = 1), 
                  geoHier[, -4])

# Spain is counted twice
HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per")

# Can be seen in the dataDummyHierarchy matrix
HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per", 
                 output = "matrixComponents")

# With unionComplement=TRUE Spain is not counted twice
HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per", 
                 unionComplement = TRUE)

# With constantsInOutput
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
                 constantsInOutput = data.frame(c1 = "AB", c2 = "CD"))
                 
# More that one valueVar
x$y <- 10*x$ths_per
HierarchyCompute(x, list(age = ageHier, geo = geoHier), c("y", "ths_per"))

[Package SSBtools version 1.5.2 Index]