rake_to_benchmarks {nrba} | R Documentation |
Re-weight data to match population benchmarks, using raking or post-stratification
Description
Adjusts weights in the data to ensure that estimated population totals for grouping variables match known population benchmarks. If there is only one grouping variable, simple post-stratification is used. If there are multiple grouping variables, raking (also known as iterative post-stratification) is used.
Usage
rake_to_benchmarks(
survey_design,
group_vars,
group_benchmark_vars,
max_iterations = 100,
epsilon = 5e-06
)
Arguments
survey_design |
A survey design object created with the |
group_vars |
Names of grouping variables in the data dividing the sample into groups for which benchmark data are available. These variables cannot have any missing values |
group_benchmark_vars |
Names of group benchmark variables in the data corresponding to |
max_iterations |
If there are multiple grouping variables,
then raking is used rather than post-stratification.
The parameter |
epsilon |
If raking is used, convergence for a given margin is declared
if the maximum change in a re-weighted total is less than |
Details
Raking adjusts the weight assigned to each sample member
so that, after reweighting, the weighted sample percentages for population subgroups
match their known population percentages. In a sense, raking causes
the sample to more closely resemble the population in terms of variables
for which population sizes are known.
Raking can be useful to reduce nonresponse bias caused by
having groups which are overrepresented in the responding sample
relative to their population size.
If the population subgroups systematically differ in terms of outcome variables of interest,
then raking can also be helpful in terms of reduce sampling variances. However,
when population subgroups do not differ in terms of outcome variables of interest,
then raking may increase sampling variances.
There are two basic requirements for raking.
Basic Requirement 1 - Values of the grouping variable(s) must be known for all respondents.
Basic Requirement 2 - The population size of each group must be known (or precisely estimated).
When there is effectively only one grouping variable (though this variable can be defined as a combination of other variables), raking amounts to simple post-stratification. For example, simple post-stratification would be used if the grouping variable is "Age x Sex x Race", and the population size of each combination of age, sex, and race is known. The method of "iterative poststratification" (also known as "iterative proportional fitting") is used when there are multiple grouping variables, and population sizes are known for each grouping variable but not for combinations of grouping variables. For example, iterative proportional fitting would be necessary if population sizes are known for age groups and for gender categories but not for combinations of age groups and gender categories.
Value
A survey design object with raked or post-stratified weights
Examples
# Load the survey data
data(involvement_survey_srs, package = "nrba")
# Calculate population benchmarks
population_benchmarks <- list(
"PARENT_HAS_EMAIL" = data.frame(
PARENT_HAS_EMAIL = c("Has Email", "No Email"),
PARENT_HAS_EMAIL_POP_BENCHMARK = c(17036, 2964)
),
"STUDENT_RACE" = data.frame(
STUDENT_RACE = c(
"AM7 (American Indian or Alaska Native)", "AS7 (Asian)",
"BL7 (Black or African American)",
"HI7 (Hispanic or Latino Ethnicity)", "MU7 (Two or More Races)",
"PI7 (Native Hawaiian or Other Pacific Islander)",
"WH7 (White)"
),
STUDENT_RACE_POP_BENCHMARK = c(206, 258, 3227, 1097, 595, 153, 14464)
)
)
# Add the population benchmarks as variables in the data
involvement_survey_srs <- merge(
x = involvement_survey_srs,
y = population_benchmarks$PARENT_HAS_EMAIL,
by = "PARENT_HAS_EMAIL"
)
involvement_survey_srs <- merge(
x = involvement_survey_srs,
y = population_benchmarks$STUDENT_RACE,
by = "STUDENT_RACE"
)
# Create a survey design object
library(survey)
survey_design <- svydesign(
weights = ~BASE_WEIGHT,
id = ~UNIQUE_ID,
fpc = ~N_STUDENTS,
data = involvement_survey_srs
)
# Subset data to only include respondents
survey_respondents <- subset(
survey_design,
RESPONSE_STATUS == "Respondent"
)
# Rake to the benchmarks
raked_survey_design <- rake_to_benchmarks(
survey_design = survey_respondents,
group_vars = c("PARENT_HAS_EMAIL", "STUDENT_RACE"),
group_benchmark_vars = c(
"PARENT_HAS_EMAIL_POP_BENCHMARK",
"STUDENT_RACE_POP_BENCHMARK"
),
)
# Inspect estimates from respondents, before and after raking
svymean(
x = ~PARENT_HAS_EMAIL,
design = survey_respondents
)
svymean(
x = ~PARENT_HAS_EMAIL,
design = raked_survey_design
)
svymean(
x = ~WHETHER_PARENT_AGREES,
design = survey_respondents
)
svymean(
x = ~WHETHER_PARENT_AGREES,
design = raked_survey_design
)