rakesvy {svyweight} | R Documentation |
Flexibly Calculate Rake Weights
Description
Calculate rake weights on a data frame, or on a
survey.design
object from survey::svydesign()
. Targets may be counts or
percentages, in vector, matrix, data frame, or w8margin form. Before
weighting, targets are converted to w8margins, checked for validity, and
matched to variables in observed data, rakesvy
returns a weighted
svydesign
object, while rakew8
returns a vector of weights.
Usage
rakesvy(
design,
...,
samplesize = "from.data",
match.levels.by = "name",
na.targets = "fail",
rebase.tol = 0.01,
control = list(maxit = 10, epsilon = 1, verbose = FALSE)
)
rakew8(
design,
...,
samplesize = "from.data",
match.levels.by = "name",
na.targets = "fail",
rebase.tol = 0.01,
control = list(maxit = 10, epsilon = 1, verbose = FALSE)
)
Arguments
design |
A |
... |
Formulas specifying weight targets, with an object that can be coerced
to class w8margin (see |
samplesize |
Either a number specifying the desired post-raking sample size, or a character string "from.data" or "from.targets" specifying how to calculate the desired sample size (see details). |
match.levels.by |
A character string that specifies how to match levels in the target with the observed data, either "name" (the default) or "order" (see details). |
na.targets |
A characters string that specifies how to handle NAs in targets, either "fail" (the default) or "observed" (see details). |
rebase.tol |
Numeric between 0 and 1. If targets are rebased, and the rebased sample sizes differs from the original sample size by more than this percentage, generates a warning. |
control |
Parameters passed to the |
Details
rakesvy and rakew8 wrangles observed data and targets into compatible formats,
before using survey::rake()
to make underlying weighting calculations.
The function matches weight targets to observed
variables, cleans both targets and observed variables, and then checks the
validity of weight targets (partially by calling
w8margin_matched()
) before raking. It also allows a weight
target of zero, and assigns an automatic weight of zero to cases on this target
level.
Weight target levels can be matched with observed variable levels in
two ways, specified via the match.levels.by
parameter. "name" (the
default) matches based on name, disregarding order (so a "male" level in
the weight target will be matched with a "male" level in the observed
data). "order" matches based on order, disregarding name (so the first
level or row of the target will match with the first level of the observed
factor variable).
By default, with parameter na.targets = "fail"
), an NA in weight targets
will cause an error. With na.targets = "observed"
, rakesvy() and rakew8() will instead
compute a target that matches the observed data. The category with an NA target will
therefore have a similar incidence rate in the pre-raking and post-raking dataset.
This is accomplished by calling impute_w8margin()
before raking; see
the impute_w8margin documentation for more details. Note that NAs in observed
data (as opposed to targets) will always cause failure, and are not affected by this parameter.
The desired sample size (in other words, the desired sum of weights
after raking) is specified via the samplesize
parameter. This can
be a numeric value. Alternatively, "from.data" specifies that the observed
sample size before weighting (taken from sum(weights(design))
if
applicable, or nrow()
if not); "from.targets" specifies that the total
sample sizes in target objects should be followed, and should only be used
if all targets specify the same sample size.
Value
rakesvy()
returns a survey.design
object with rake weights applied. Any changes
made to the variables in design
in order to call rake
, such as
dropping empty factor levels, are temporary and not returned in the
output object.
rakew8()
returns a vector of weights. This avoids creating
duplicated survey.design
objects, which can be useful when calculating multiple
sets of weights for the same data.
Examples
# Computing only rake weights
# EG, for a survey conducted with simple random sampling
gles17$simple_weight <- rakew8(design = gles17,
gender ~ c("Male" = .495, "Female" = .505),
eastwest ~ c("East Germany" = .195, "West Germany" = .805)
)
# Specifying a recode of variable in observed dataset
require(dplyr)
gles17_raked <- rakesvy(design = gles17,
gender ~ c("Male" = .495, "Female" = .505),
dplyr::recode(agecat, `<=29` = "<=39", `30-39` = "<=39") ~
c("<=39" = .31, "40-49" = .15, "50-59" = .19, "60-69" = .15, ">=70" = .21)
)
# Computing rake weights after design weights
# EG, for a survey with complex sampling design
require(survey)
gles17_dweighted <- svydesign(ids = gles17$vpoint, weights = gles17$dweight,
strata = gles17$eastwest, data = gles17, nest = TRUE)
gles17_raked <- rakesvy(design = gles17_dweighted,
gender ~ c("Male" = .495, "Female" = .505),
agecat ~ c("<=29" = .16, "30-39" = .15, "40-49" = .15,
"50-59" = .19, "60-69" = .15, ">=70" = .21)
)
# With unnamed target levels, using match.levels.by = "order"
rakew8(design = gles17,
gender ~ c(.495, .505),
eastwest ~ c(.195, .805),
match.levels.by = "order"
)