sdc_descriptives {sdcLog} | R Documentation |
Disclosure control for descriptive statistics
Description
Checks the number of distinct entities and the (n, k) dominance rule for your descriptive statistics.
That means that sdc_descriptives()
checks if there are at least 5
distinct entities and if the largest 2 entities account for 85% or more of
val_var
. The parameters can be changed using options. For details see
vignette("options", package = "sdcLog")
.
Usage
sdc_descriptives(
data,
id_var = getOption("sdc.id_var"),
val_var = NULL,
by = NULL,
zero_as_NA = NULL,
fill_id_var = FALSE
)
Arguments
data |
data.frame from which the descriptive statistics are calculated. |
id_var |
character The name of the id variable. Defaults to |
val_var |
character vector of value variables on which descriptive statistics are computed. |
by |
character vector of grouping variables. |
zero_as_NA |
logical If TRUE, zeros in 'val_var' are treated as NA. |
fill_id_var |
logical Only for very specific use cases. For example:
If Defaults to |
Details
The general form of the \((n, k)\) dominance rule can be formulated as:
\[\sum_{i=1}^{n}x_i > \frac{k}{100} \sum_{i=1}^{N}x_i\]where \(x_1 \ge x_2 \ge \cdots \ge x_{N}\). \(n\) denotes the number of largest contributions to be considered, \(x_n\) the \(n\)-th largest contribution, \(k\) the maximal percentage these \(n\) contributions may account for, and \(N\) is the total number of observations.
If the statement above is true, the \((n, k)\) dominance rule is violated.
Value
A list of class sdc_descriptives
with detailed information about
options, settings, and compliance with the criteria distinct entities and
dominance.
Examples
sdc_descriptives(
data = sdc_descriptives_DT,
id_var = "id",
val_var = "val_1"
)
sdc_descriptives(
data = sdc_descriptives_DT,
id_var = "id",
val_var = "val_1",
by = "sector"
)
sdc_descriptives(
data = sdc_descriptives_DT,
id_var = "id",
val_var = "val_1",
by = c("sector", "year")
)
sdc_descriptives(
data = sdc_descriptives_DT,
id_var = "id",
val_var = "val_2",
by = c("sector", "year")
)
sdc_descriptives(
data = sdc_descriptives_DT,
id_var = "id",
val_var = "val_2",
by = c("sector", "year"),
zero_as_NA = FALSE
)