getBreaks {simPop} | R Documentation |
Compute break points for categorizing (semi-)continuous variables
Description
Compute break points for categorizing continuous or semi-continuous
variables using (weighted) quantiles. This is a utility function that is
useful for writing custom wrapper functions such as simEUSILC
.
Usage
getBreaks(
x,
weights = NULL,
zeros = TRUE,
lower = NULL,
upper = NULL,
equidist = TRUE,
probs = NULL,
strata = NULL
)
Arguments
x |
a numeric vector to be categorized. |
weights |
an optional numeric vector containing sample weights. |
zeros |
a logical indicating whether |
lower , upper |
optional numeric values specifying lower and upper bounds
other than minimum and maximum of |
equidist |
a logical indicating whether the (positive) break points should be equidistant or whether there should be refinements in the lower and upper tail (see “Details”). |
probs |
a numeric vector of probabilities with values in |
strata |
an optional vector specifying a strata variable (e.g household ids).
if specified, the mean of |
Details
If equidist
is TRUE
, the behavior is as follows. If
zeros
is TRUE
as well, the 0%, 10%, ..., 90% quantiles
of the negative values and the 10%, 20%, ..., 100% of the positive
values are computed. These quantiles are then used as break points together
with 0. If zeros
is not TRUE
, on the other hand, the 0%,
10%, ..., 100% quantiles of all values are used.
If equidist
is not TRUE
, the behavior is as follows. If
zeros
is not TRUE
, the 1%, 5%, 10%, 20%, 40%, 60%, 80%,
90%, 95% and 99% quantiles of all values are used for the inner part of
the data (instead of the equidistant 10%, ..., 90% quantiles). If
zeros
is TRUE
, these quantiles are only used for the positive
values while the quantiles of the negative values remain equidistant.
Note that duplicated values among the quantiles are discarded and that the
minimum and maximum are replaced with lower
and upper
,
respectively, if these are specified.
The (weighted) quantiles are computed with the function
quantileWt
.
Value
A numeric vector of break points.
Author(s)
Andreas Alfons and Bernhard Meindl
See Also
Examples
data(eusilcS)
# semi-continuous variable, positive break points equidistant
getBreaks(eusilcS$netIncome, weights=eusilcS$rb050)
# semi-continuous variable, positive break points not equidistant
getBreaks(eusilcS$netIncome, weights=eusilcS$rb050,
equidist = FALSE)