R: Flexibly Calculate Rake Weights

rakesvy {svyweight}

R Documentation

Flexibly Calculate Rake Weights

Description

Calculate rake weights on a data frame, or on a survey.design object from survey::svydesign(). Targets may be counts or percentages, in vector, matrix, data frame, or w8margin form. Before weighting, targets are converted to w8margins, checked for validity, and matched to variables in observed data, rakesvy returns a weighted svydesign object, while rakew8 returns a vector of weights.

Usage

rakesvy(
  design,
  ...,
  samplesize = "from.data",
  match.levels.by = "name",
  na.targets = "fail",
  rebase.tol = 0.01,
  control = list(maxit = 10, epsilon = 1, verbose = FALSE)
)

rakew8(
  design,
  ...,
  samplesize = "from.data",
  match.levels.by = "name",
  na.targets = "fail",
  rebase.tol = 0.01,
  control = list(maxit = 10, epsilon = 1, verbose = FALSE)
)

Arguments

`design`	A `survey.design` object from `survey::svydesign()`, or a data frame that can be coerced to one. When a data frame is coerced, the coercion assuming no clustering or design weighting.
`...`	Formulas specifying weight targets, with an object that can be coerced to class w8margin (see `as.w8margin()`) on the right-hand side, and (optionally) a matching variable or transformation of it on the left-hand side. Objects that can be coerced to w8margin include named numeric vectors and matrices, and data frames in the format accepted by `rake`.
`samplesize`	Either a number specifying the desired post-raking sample size, or a character string "from.data" or "from.targets" specifying how to calculate the desired sample size (see details).
`match.levels.by`	A character string that specifies how to match levels in the target with the observed data, either "name" (the default) or "order" (see details).
`na.targets`	A characters string that specifies how to handle NAs in targets, either "fail" (the default) or "observed" (see details).
`rebase.tol`	Numeric between 0 and 1. If targets are rebased, and the rebased sample sizes differs from the original sample size by more than this percentage, generates a warning.
`control`	Parameters passed to the `control` argument of `survey::rake()`, to control the details of convergence in weighting.

Details

rakesvy and rakew8 wrangles observed data and targets into compatible formats, before using survey::rake() to make underlying weighting calculations. The function matches weight targets to observed variables, cleans both targets and observed variables, and then checks the validity of weight targets (partially by calling w8margin_matched()) before raking. It also allows a weight target of zero, and assigns an automatic weight of zero to cases on this target level.

Weight target levels can be matched with observed variable levels in two ways, specified via the match.levels.by parameter. "name" (the default) matches based on name, disregarding order (so a "male" level in the weight target will be matched with a "male" level in the observed data). "order" matches based on order, disregarding name (so the first level or row of the target will match with the first level of the observed factor variable).

By default, with parameter na.targets = "fail"), an NA in weight targets will cause an error. With na.targets = "observed", rakesvy() and rakew8() will instead compute a target that matches the observed data. The category with an NA target will therefore have a similar incidence rate in the pre-raking and post-raking dataset. This is accomplished by calling impute_w8margin() before raking; see the impute_w8margin documentation for more details. Note that NAs in observed data (as opposed to targets) will always cause failure, and are not affected by this parameter.

The desired sample size (in other words, the desired sum of weights after raking) is specified via the samplesize parameter. This can be a numeric value. Alternatively, "from.data" specifies that the observed sample size before weighting (taken from sum(weights(design)) if applicable, or nrow() if not); "from.targets" specifies that the total sample sizes in target objects should be followed, and should only be used if all targets specify the same sample size.

Value

rakesvy() returns a survey.design object with rake weights applied. Any changes made to the variables in design in order to call rake, such as dropping empty factor levels, are temporary and not returned in the output object.

rakew8() returns a vector of weights. This avoids creating duplicated survey.design objects, which can be useful when calculating multiple sets of weights for the same data.

Examples

# Computing only rake weights
# EG, for a survey conducted with simple random sampling
gles17$simple_weight <- rakew8(design = gles17, 
    gender ~ c("Male" = .495, "Female" = .505),
    eastwest ~ c("East Germany" = .195, "West Germany" = .805)
)

# Specifying a recode of variable in observed dataset
require(dplyr)
gles17_raked <- rakesvy(design = gles17, 
    gender ~ c("Male" = .495, "Female" = .505),
    dplyr::recode(agecat, `<=29` = "<=39", `30-39` = "<=39") ~ 
        c("<=39" = .31, "40-49" = .15, "50-59" = .19, "60-69" = .15, ">=70" = .21)
)

# Computing rake weights after design weights
# EG, for a survey with complex sampling design
require(survey)
gles17_dweighted <- svydesign(ids = gles17$vpoint, weights = gles17$dweight, 
    strata = gles17$eastwest, data = gles17, nest = TRUE)
gles17_raked <- rakesvy(design = gles17_dweighted, 
    gender ~ c("Male" = .495, "Female" = .505),
    agecat ~ c("<=29" = .16, "30-39" = .15, "40-49" = .15, 
        "50-59" = .19, "60-69" = .15, ">=70" = .21)
)

# With unnamed target levels, using match.levels.by = "order"
rakew8(design = gles17, 
    gender ~ c(.495, .505),
    eastwest ~ c(.195, .805),
    match.levels.by = "order"
)

[Package svyweight version 0.1.0 Index]