R: Create a problem instance

makeProblem {sdcTable}

R Documentation

Create a problem instance

Description

Function makeProblem() is used to create sdcProblem objects.

Usage

makeProblem(
  data,
  dimList,
  dimVarInd = NULL,
  freqVarInd = NULL,
  numVarInd = NULL,
  weightInd = NULL,
  sampWeightInd = NULL
)

Arguments

`data`	a data frame featuring at least one column for each desired dimensional variable. Optionally the input data can feature variables that contain information on cell counts, weights that should be used during the cut and branch algorithm, additional numeric variables or variables that hold information on sampling weights.
`dimList`	a (named) list where the names refer to variable names in input `data`. If the list is not named, it is required to specify argument `dimVarInd`. Each list element can be one of: `tree`: generated with `⁠hier_()⁠` functions from package `sdcHierarchies` `data.frame`: a two column `data.frame` containing the full hierarchy of a dimensional variable using a top-to-bottom approach. The format of this `data.frame` is as follows: first column:* a character vector specifying levels with each vector element being a string only containing of `@`s from length 1 to n. If a vector element consists of `i`-chars, the corresponding code is of level `i`. The code `@` (one character) equals the grand total (level=1), the code `⁠@@⁠` (two characters) is of level 2 (directly below the overall total). second column: a character vector specifying level codes `path`: absolute or relative path to a `.csv` file that contains two columns seperated by semicolons (`⁠;⁠`) having the same structure as the `"@;levelname"`-structure described above
`dimVarInd`	if `dimList` is a named list, this argument is ignored (`NULL`). Else either a numeric or character vector defining the column indices or names of dimensional variables (specifying the table) within argument `data` are expected.
`freqVarInd`	if not `NULL`, a scalar numeric or character vector defining the column index or variable name of a variable holding counts in `data`
`numVarInd`	if not `NULL`, a numeric or character vector defining the column indices or variable names of additional numeric variables with respect to `data`
`weightInd`	if not `NULL`, a scalar numeric or character vector defining the column index or variable name holding costs within `data` that should be used as objective coefficients when solving secondary cell suppression problems.
`sampWeightInd`	if not `NULL`, a scalar numeric or character vector defining the column index or variable name of a variable holding sampling weights within `data`. In case a complete table is provided, this parameter is ignored.

Value

a sdcProblem object

Author(s)

Bernhard Meindl

Examples

# loading micro data
utils::data("microdata1", package = "sdcTable")

# we can observe that we have a micro data set consisting
# of two spanning variables ('region' and 'gender') and one
# numeric variable ('val')

# specify structure of hierarchical variable 'region'
# levels 'A' to 'D' sum up to a Total
dim.region <- data.frame(
 levels=c('@','@@','@@','@@','@@'),
 codes=c('Total', 'A','B','C','D'),
 stringsAsFactors=FALSE)

# specify structure of hierarchical variable 'gender'
# using create_node() and add_nodes() (see ?manage_hierarchies)
dim.gender <- hier_create(root = "Total", nodes = c("male", "female"))
hier_display(dim.gender)

# create a named list with each element being a data-frame
# containing information on one dimensional variable and
# the names referring to variables in the input data
dimList <- list(region = dim.region, gender = dim.gender)

# third column containts a numeric variable
numVarInd <- 3

# no variables holding counts, numeric values, weights or sampling
# weights are available in the input data
# creating an problem instance using numeric indices
p1 <- makeProblem(
  data = microdata1,
  dimList = dimList,
  numVarInd = 3 # third variable in `data`
)

# using variable names is also possible
p2 <- makeProblem(
  data = microdata1,
  dimList = dimList,
  numVarInd = "val"
)

# what do we have?
print(class(p1))

# have a look at the data
df1 <- sdcProb2df(p1, addDups = TRUE,
  addNumVars = TRUE, dimCodes = "original")
df2 <- sdcProb2df(p2, addDups=TRUE,
  addNumVars = TRUE, dimCodes = "original")
print(df1)

identical(df1, df2)

[Package sdcTable version 0.32.6 Index]