R: svyweight: Quick and Flexible Rake Weighting

svyweight {svyweight}

R Documentation

svyweight: Quick and Flexible Rake Weighting

Description

svyweight is a package for quickly and flexibly calculating rake weights (also know as rim weights). It is designed to interact with survey.design objects generated via survey::svydesign(), and other to otherwise build on functionalities from Thomas Lumley's 'survey' package.

Rake weighting concepts

Post-stratification weights are commonly used in survey research to ensure that sample is representative of the population it is drawn from, in cases where some people selected for inclusion in a sample might decline to participate. To calculate post-stratification weights, observed categorical variables in a survey dataset (usually demographic variables) must be matched with "targets" (typically known population demographics from census data). Survey respondents from underrepresented categories are upweighted, while respondents from overrepresented categories are downweighted.

svyweight implements "rake" or "rim" weighting (sometimes known as iterative proportional fitting). This is a widely-used method for simultaneously calculating weights on multiple variables, when no join distribution for these variables is known. For example, population data on past vote (from election results) and age (from the census) are generally known. However, as the joint distribution of past vote and age is not generally known, a technique such as rake weighting must be used to apply weights on both variables simultaneously.

Package features

The core function in svyweight is rakesvy() (and the related rakew8(). This takes calculates post-stratification weights given A) data frame or a survey.design object generated by svydesign(), and B) a set of weighting targets The command is designed to make weighting as simple as possible, with the following features:

Weighting to either counts or percentage targets
Allowing specification of targets as vectors, matrices, or data frames
Accepting targets of 0 (equivalent to dropping cases from analysis)
Allowing targets to be quickly rebased a specified sample size
Flexibly matching targets to the correct variables in a dataset
Dynamically specifying weight targets based on recodes of variables in observed data

The package does this in part by introducing the w8margin object class. A w8margin is a desired raw count of categories for a variable, in the format required for actually computing weights. However, this format is somewhat cumbersome to specify manually. The package includes methods for converting named vectors, matrices, and data frames to w8margin object; [rakesvy()] and rakew8() call these methods automatically.

At present, the core weighting calculations are actually performed via the 'survey' package's survey::rake() function. This might change with future releases, although the basic approach to iterative weighting is not expected to change.

The package is under development. Contributions to the package, or suggestions for additional features, are gratefully accepted via email or GitHub.

Author(s)

Ben Mainwaring (mainwaringb@gmail.com, https://www.linkedin.com/in/mainwaringb)

References