PLSrounding {SmallCountRounding} | R Documentation |
PLS inspired rounding
Description
Small count rounding of necessary inner cells are performed so that all small frequencies of cross-classifications to be published (publishable cells) are rounded. The publishable cells can be defined from a model formula, hierarchies or automatically from data.
Usage
PLSrounding(
data,
freqVar = NULL,
roundBase = 3,
hierarchies = NULL,
formula = NULL,
dimVar = NULL,
maxRound = roundBase - 1,
printInc = nrow(data) > 1000,
output = NULL,
preAggregate = is.null(freqVar),
...
)
PLSroundingInner(..., output = "inner")
PLSroundingPublish(..., output = "publish")
Arguments
data |
Input data as a data frame (inner cells) |
freqVar |
Variable holding counts (inner cells frequencies). When |
roundBase |
Rounding base |
hierarchies |
List of hierarchies |
formula |
Model formula defining publishable cells |
dimVar |
The main dimensional variables and additional aggregating variables. This parameter can be useful when hierarchies and formula are unspecified. |
maxRound |
Inner cells contributing to original publishable cells equal to or less than maxRound will be rounded |
printInc |
Printing iteration information to console when TRUE |
output |
Possible non-NULL values are |
preAggregate |
When |
... |
Further parameters sent to |
Details
This function is a user-friendly wrapper for RoundViaDummy
with data frame output and with computed summary of the results.
See RoundViaDummy
for more details.
Value
Output is a four-element list with class attribute "PLSrounded" (to ensure informative printing).
inner |
Data frame corresponding to input data with the main dimensional variables and with cell frequencies (original, rounded, difference). |
publish |
Data frame of publishable data with the main dimensional variables and with cell frequencies (original, rounded, difference). |
metrics |
A named character vector of various statistics calculated from the two output data frames
(" |
freqTable |
Matrix of frequencies of cell frequencies and absolute differences.
For example, row " |
References
Langsrud, Ø. and Heldal, J. (2018): “An Algorithm for Small Count Rounding of Tabular Data”. Presented at: Privacy in statistical databases, Valencia, Spain. September 26-28, 2018. https://www.researchgate.net/publication/327768398_An_Algorithm_for_Small_Count_Rounding_of_Tabular_Data
See Also
RoundViaDummy
, PLS2way
, ModelMatrix
Examples
# Small example data set
z <- SmallCountData("e6")
print(z)
# Publishable cells by formula interface
a <- PLSrounding(z, "freq", roundBase = 5, formula = ~geo + eu + year)
print(a)
print(a$inner)
print(a$publish)
print(a$metrics)
print(a$freqTable)
# Recalculation of maxdiff, HDutility, meanAbsDiff and rootMeanSquare
max(abs(a$publish[, "difference"]))
HDutility(a$publish[, "original"], a$publish[, "rounded"])
mean(abs(a$publish[, "difference"]))
sqrt(mean((a$publish[, "difference"])^2))
# Six lines below produce equivalent results
# Ordering of rows can be different
PLSrounding(z, "freq") # All variables except "freq" as dimVar
PLSrounding(z, "freq", dimVar = c("geo", "eu", "year"))
PLSrounding(z, "freq", formula = ~eu * year + geo * year)
PLSrounding(z[, -2], "freq", hierarchies = SmallCountData("eHrc"))
PLSrounding(z[, -2], "freq", hierarchies = SmallCountData("eDimList"))
PLSrounding(z[, -2], "freq", hierarchies = SmallCountData("eDimList"), formula = ~geo * year)
# Define publishable cells differently by making use of formula interface
PLSrounding(z, "freq", formula = ~eu * year + geo)
# Define publishable cells differently by making use of hierarchy interface
eHrc2 <- list(geo = c("EU", "@Portugal", "@Spain", "Iceland"), year = c("2018", "2019"))
PLSrounding(z, "freq", hierarchies = eHrc2)
# Also possible to combine hierarchies and formula
PLSrounding(z, "freq", hierarchies = SmallCountData("eDimList"), formula = ~geo + year)
# Single data frame output
PLSroundingInner(z, "freq", roundBase = 5, formula = ~geo + eu + year)
PLSroundingPublish(z, roundBase = 5, formula = ~geo + eu + year)
# Microdata input
PLSroundingInner(rbind(z, z), roundBase = 5, formula = ~geo + eu + year)
# Parameter avoidHierarchical (see RoundViaDummy and ModelMatrix)
PLSroundingPublish(z, roundBase = 5, formula = ~geo + eu + year, avoidHierarchical = TRUE)
# Package sdcHierarchies can be used to create hierarchies.
# The small example code below works if this package is available.
if (require(sdcHierarchies)) {
z2 <- cbind(geo = c("11", "21", "22"), z[, 3:4], stringsAsFactors = FALSE)
h2 <- list(
geo = hier_compute(inp = unique(z2$geo), dim_spec = c(1, 1), root = "Tot", as = "df"),
year = hier_convert(hier_create(root = "Total", nodes = c("2018", "2019")), as = "df"))
PLSrounding(z2, "freq", hierarchies = h2)
}
# Use PLS2way to produce tables as in Langsrud and Heldal (2018) and to demonstrate
# parameters maxRound, zeroCandidates and identifyNew (see RoundViaDummy).
# Parameter rndSeed used to ensure same output as in reference.
exPSD <- SmallCountData("exPSD")
a <- PLSrounding(exPSD, "freq", 5, formula = ~rows + cols, rndSeed=124)
PLS2way(a, "original") # Table 1
PLS2way(a) # Table 2
a <- PLSrounding(exPSD, "freq", 5, formula = ~rows + cols, identifyNew = FALSE, rndSeed=124)
PLS2way(a) # Table 3
a <- PLSrounding(exPSD, "freq", 5, formula = ~rows + cols, maxRound = 7)
PLS2way(a) # Values in col1 rounded
a <- PLSrounding(exPSD, "freq", 5, formula = ~rows + cols, zeroCandidates = TRUE)
PLS2way(a) # (row3, col4): original is 0 and rounded is 5