t_test_of_weight_adjustment {nrba}R Documentation

t-test of differences in estimated means/percentages from two different sets of replicate weights.

Description

Tests whether estimates of means/percentages differ systematically between two sets of replicate weights: an original set of weights, and the weights after adjustment (e.g. post-stratification or nonresponse adjustments) and possibly subsetting (e.g. subsetting to only include respondents).

Usage

t_test_of_weight_adjustment(
  orig_design,
  updated_design,
  y_vars,
  na.rm = TRUE,
  null_difference = 0,
  alternative = "unequal",
  degrees_of_freedom = NULL
)

Arguments

orig_design

A replicate design object created with the survey package.

updated_design

A potentially updated version of orig_design, for example where weights have been adjusted for nonresponse or updated using post-stratification. The type and number of sets of replicate weights must match that of orig_design. The number of rows may differ (e.g. if orig_design includes the full sample but updated_design only includes respondents).

y_vars

Names of dependent variables for tests. For categorical variables, percentages of each category are tested.

na.rm

Whether to drop cases with missing values for a given dependent variable.

null_difference

The difference between the two means/percentages under the null hypothesis. Default is 0.

alternative

Can be one of the following:

  • 'unequal' (the default): two-sided test of whether difference in means is equal to null_difference

  • 'less': one-sided test of whether difference is less than null_difference

  • 'greater': one-sided test of whether difference is greater than null_difference

degrees_of_freedom

The degrees of freedom to use for the test's reference distribution. Unless specified otherwise, the default is the design degrees of freedom minus one, where the design degrees of freedom are estimated using the survey package's degf method applied to the 'stacked' design formed by combining orig_design and updated_design.

Value

A data frame describing the results of the t-tests, one row per dependent variable.

Statistical Details

The t-statistic used for the test has as its numerator the difference in means/percentages between the two samples, minus the null_difference. The denominator for the t-statistic is the estimated standard error of the difference in means. Because the two means are based on overlapping groups and thus have correlated sampling errors, special care is taken to estimate the covariance of the two estimates. For designs which use sets of replicate weights for variance estimation, the two means and their difference are estimated using each set of replicate weights; the estimated differences from the sets of replicate weights are then used to estimate sampling error with a formula appropriate to the replication method (JKn, BRR, etc.).

This analysis is not implemented for designs which use linearization methods for variance estimation.
Unless specified otherwise using the degrees_of_freedom parameter, the degrees of freedom for the test are set to the design degrees of freedom minus one. Design degrees of freedom are estimated using the survey package's degf method.

See Van de Kerckhove et al. (2009) for an example of this type of nonresponse bias analysis (among others). See Lohr and Riddles (2016) for the statistical details of this test.

References

Examples


library(survey)

# Create a survey design ----

data(involvement_survey_srs, package = 'nrba')

survey_design <- svydesign(weights = ~ BASE_WEIGHT,
                           id = ~ UNIQUE_ID,
                           fpc = ~ N_STUDENTS,
                           data = involvement_survey_srs)

# Create replicate weights for the design ----
rep_svy_design <- as.svrepdesign(survey_design, type = "subbootstrap",
                                 replicates = 500)

# Subset to only respondents (always subset *after* creating replicate weights)

rep_svy_respondents <- subset(rep_svy_design,
                              RESPONSE_STATUS == "Respondent")

# Apply raking adjustment ----

raked_rep_svy_respondents <- rake_to_benchmarks(
  survey_design = rep_svy_respondents,
  group_vars = c("PARENT_HAS_EMAIL", "STUDENT_RACE"),
  group_benchmark_vars = c("PARENT_HAS_EMAIL_BENCHMARK",
                           "STUDENT_RACE_BENCHMARK"),
)

# Compare estimates from respondents in original vs. adjusted design ----

t_test_of_weight_adjustment(orig_design = rep_svy_respondents,
                            updated_design = raked_rep_svy_respondents,
                            y_vars = c('STUDENT_AGE', 'STUDENT_SEX'))

t_test_of_weight_adjustment(orig_design = rep_svy_respondents,
                            updated_design = raked_rep_svy_respondents,
                            y_vars = c('WHETHER_PARENT_AGREES'))

# Compare estimates to true population values ----

data('involvement_survey_pop', package = 'nrba')

mean(involvement_survey_pop$STUDENT_AGE)

prop.table(table(involvement_survey_pop$STUDENT_SEX))


[Package nrba version 0.3.1 Index]