t_test_by_response_status {nrba}R Documentation

t-test of differences in means/percentages between responding sample and full sample, or between responding sample and eligible sample

Description

The function t_test_resp_vs_full tests whether means of auxiliary variables differ between respondents and the full selected sample, where the full sample consists of all cases regardless of response status or eligibility status.
The function t_test_resp_vs_elig tests whether means differ between the responding sample and the eligible sample, where the eligible sample consists of all cases known to be eligible, regardless of response status.

See Lohr and Riddles (2016) for the statistical theory of this test.

Usage

t_test_resp_vs_full(
  survey_design,
  y_vars,
  na.rm = TRUE,
  status,
  status_codes = c("ER", "EN", "IE", "UE"),
  null_difference = 0,
  alternative = "unequal",
  degrees_of_freedom = survey::degf(survey_design) - 1
)

t_test_resp_vs_elig(
  survey_design,
  y_vars,
  na.rm = TRUE,
  status,
  status_codes = c("ER", "EN", "IE", "UE"),
  null_difference = 0,
  alternative = "unequal",
  degrees_of_freedom = survey::degf(survey_design) - 1
)

Arguments

survey_design

A survey design object created with the survey package.

y_vars

Names of dependent variables for tests. For categorical variables, percentages of each category are tested.

na.rm

Whether to drop cases with missing values for a given dependent variable.

status

The name of the variable representing response/eligibility status.
The status variable should have at most four categories, representing eligible respondents (ER), eligible nonrespondents (EN), known ineligible cases (IE), and cases whose eligibility is unknown (UE).

status_codes

A named vector, with four entries named 'ER', 'EN', 'IE', and 'UE'.
status_codes indicates how the values of the status variable are to be interpreted.

null_difference

The difference between the two means under the null hypothesis. Default is 0.

alternative

Can be one of the following:

  • 'unequal' (the default): two-sided test of whether difference in means is equal to null_difference

  • 'less': one-sided test of whether difference is less than null_difference

  • 'greater': one-sided test of whether difference is greater than null_difference

degrees_of_freedom

The degrees of freedom to use for the test's reference distribution. Unless specified otherwise, the default is the design degrees of freedom minus one, where the design degrees of freedom are estimated using the survey package's degf method.

Value

A data frame describing the results of the t-tests, one row per dependent variable.

Statistical Details

The t-statistic used for the test has as its numerator the difference in means between the two samples, minus the null_difference. The denominator for the t-statistic is the estimated standard error of the difference in means. Because the two means are based on overlapping groups and thus have correlated sampling errors, special care is taken to estimate the covariance of the two estimates. For designs which use sets of replicate weights for variance estimation, the two means and their difference are estimated using each set of replicate weights; the estimated differences from the sets of replicate weights are then used to estimate sampling error with a formula appropriate to the replication method (JKn, BRR, etc.). For designs which use linearization methods for variance estimation, the covariance between the two means is estimated using the method of linearization based on influence functions implemented in the survey package. See Osier (2009) for an overview of the method of linearization based on influence functions. Eckman et al. (2023) showed in a simulation study that linearization and replication performed similarly in estimating the variance of a difference in means for overlapping samples.

Unless specified otherwise using the degrees_of_freedom parameter, the degrees of freedom for the test are set to the design degrees of freedom minus one. Design degrees of freedom are estimated using the survey package's degf method.

See Lohr and Riddles (2016) for the statistical details of this test. See Van de Kerckhove et al. (2009) and Amaya and Presser (2017) for examples of a nonresponse bias analysis which uses t-tests to compare responding samples to eligible samples.

References

Examples

library(survey)

# Create a survey design ----
data(involvement_survey_srs, package = 'nrba')

survey_design <- svydesign(weights = ~ BASE_WEIGHT,
                           id = ~ UNIQUE_ID,
                           fpc = ~ N_STUDENTS,
                           data = involvement_survey_srs)

# Compare respondents' mean to the full sample mean ----

t_test_resp_vs_full(survey_design = survey_design,
                    y_vars = c("STUDENT_AGE", "WHETHER_PARENT_AGREES"),
                    status = 'RESPONSE_STATUS',
                    status_codes = c('ER' = "Respondent",
                                     'EN' = "Nonrespondent",
                                     'IE' = "Ineligible",
                                     'UE' = "Unknown"))

# Compare respondents' mean to the mean of all eligible cases ----

t_test_resp_vs_full(survey_design = survey_design,
                    y_vars = c("STUDENT_AGE", "WHETHER_PARENT_AGREES"),
                    status = 'RESPONSE_STATUS',
                    status_codes = c('ER' = "Respondent",
                                     'EN' = "Nonrespondent",
                                     'IE' = "Ineligible",
                                     'UE' = "Unknown"))
# One-sided tests ----

  ## Null Hypothesis: Y_bar_resp - Y_bar_full <= 0.1
  ## Alt. Hypothesis: Y_bar_resp - Y_bar_full >  0.1

t_test_resp_vs_full(survey_design = survey_design,
                    y_vars = c("STUDENT_AGE", "WHETHER_PARENT_AGREES"),
                    status = 'RESPONSE_STATUS',
                    status_codes = c('ER' = "Respondent",
                                     'EN' = "Nonrespondent",
                                     'IE' = "Ineligible",
                                     'UE' = "Unknown"),
                    null_difference = 0.1, alternative = 'greater')

  ## Null Hypothesis: Y_bar_resp - Y_bar_full >= 0.1
  ## Alt. Hypothesis: Y_bar_resp - Y_bar_full <  0.1

t_test_resp_vs_full(survey_design = survey_design,
                    y_vars = c("STUDENT_AGE", "WHETHER_PARENT_AGREES"),
                    status = 'RESPONSE_STATUS',
                    status_codes = c('ER' = "Respondent",
                                     'EN' = "Nonrespondent",
                                     'IE' = "Ineligible",
                                     'UE' = "Unknown"),
                    null_difference = 0.1, alternative = 'less')

[Package nrba version 0.3.1 Index]