R: t-test of differences in means/percentages between responding...

t_test_by_response_status {nrba}

R Documentation

t-test of differences in means/percentages between responding sample and full sample, or between responding sample and eligible sample

Description

The function t_test_resp_vs_full tests whether means of auxiliary variables differ between respondents and the full selected sample, where the full sample consists of all cases regardless of response status or eligibility status.
The function t_test_resp_vs_elig tests whether means differ between the responding sample and the eligible sample, where the eligible sample consists of all cases known to be eligible, regardless of response status.

See Lohr and Riddles (2016) for the statistical theory of this test.

Usage

t_test_resp_vs_full(
  survey_design,
  y_vars,
  na.rm = TRUE,
  status,
  status_codes = c("ER", "EN", "IE", "UE"),
  null_difference = 0,
  alternative = "unequal",
  degrees_of_freedom = survey::degf(survey_design) - 1
)

t_test_resp_vs_elig(
  survey_design,
  y_vars,
  na.rm = TRUE,
  status,
  status_codes = c("ER", "EN", "IE", "UE"),
  null_difference = 0,
  alternative = "unequal",
  degrees_of_freedom = survey::degf(survey_design) - 1
)

Arguments

`survey_design`	A survey design object created with the `survey` package.
`y_vars`	Names of dependent variables for tests. For categorical variables, percentages of each category are tested.
`na.rm`	Whether to drop cases with missing values for a given dependent variable.
`status`	The name of the variable representing response/eligibility status. The `status` variable should have at most four categories, representing eligible respondents (ER), eligible nonrespondents (EN), known ineligible cases (IE), and cases whose eligibility is unknown (UE).
`status_codes`	A named vector, with four entries named 'ER', 'EN', 'IE', and 'UE'. `status_codes` indicates how the values of the `status` variable are to be interpreted.
`null_difference`	The difference between the two means under the null hypothesis. Default is `0`.
`alternative`	Can be one of the following: `'unequal'` (the default): two-sided test of whether difference in means is equal to `null_difference` `'less'`: one-sided test of whether difference is less than `null_difference` `'greater'`: one-sided test of whether difference is greater than `null_difference`
`degrees_of_freedom`	The degrees of freedom to use for the test's reference distribution. Unless specified otherwise, the default is the design degrees of freedom minus one, where the design degrees of freedom are estimated using the `survey` package's `degf` method.

Value

A data frame describing the results of the t-tests, one row per dependent variable.

Statistical Details

The t-statistic used for the test has as its numerator the difference in means between the two samples, minus the null_difference. The denominator for the t-statistic is the estimated standard error of the difference in means. Because the two means are based on overlapping groups and thus have correlated sampling errors, special care is taken to estimate the covariance of the two estimates. For designs which use sets of replicate weights for variance estimation, the two means and their difference are estimated using each set of replicate weights; the estimated differences from the sets of replicate weights are then used to estimate sampling error with a formula appropriate to the replication method (JKn, BRR, etc.). For designs which use linearization methods for variance estimation, the covariance between the two means is estimated using the method of linearization based on influence functions implemented in the survey package. See Osier (2009) for an overview of the method of linearization based on influence functions. Eckman et al. (2023) showed in a simulation study that linearization and replication performed similarly in estimating the variance of a difference in means for overlapping samples.

Unless specified otherwise using the degrees_of_freedom parameter, the degrees of freedom for the test are set to the design degrees of freedom minus one. Design degrees of freedom are estimated using the survey package's degf method.

See Lohr and Riddles (2016) for the statistical details of this test. See Van de Kerckhove et al. (2009) and Amaya and Presser (2017) for examples of a nonresponse bias analysis which uses t-tests to compare responding samples to eligible samples.

References

Amaya, A., Presser, S. (2017). Nonresponse Bias for Univariate and Multivariate Estimates of Social Activities and Roles. Public Opinion Quarterly, Volume 81, Issue 1, 1 March 2017, Pages 1–36, https://doi.org/10.1093/poq/nfw037
Eckman, S., Unangst, J., Dever, J., Antoun, A. (2023). The Precision of Estimates of Nonresponse Bias in Means. Journal of Survey Statistics and Methodology, 11(4), 758-783. https://doi.org/10.1093/jssam/smac019
Lohr, S., Riddles, M. (2016). Tests for Evaluating Nonresponse Bias in Surveys. Survey Methodology 42(2): 195-218. https://www150.statcan.gc.ca/n1/pub/12-001-x/2016002/article/14677-eng.pdf
Osier, G. (2009). Variance estimation for complex indicators of poverty and inequality using linearization techniques. Survey Research Methods, 3(3), 167-195. https://doi.org/10.18148/srm/2009.v3i3.369
Van de Kerckhove, W., Krenzke, T., and Mohadjer, L. (2009). Adult Literacy and Lifeskills Survey (ALL) 2003: U.S. Nonresponse Bias Analysis (NCES 2009-063). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education. Washington, DC.

Examples

library(survey)

# Create a survey design ----
data(involvement_survey_srs, package = 'nrba')

survey_design <- svydesign(weights = ~ BASE_WEIGHT,
                           id = ~ UNIQUE_ID,
                           fpc = ~ N_STUDENTS,
                           data = involvement_survey_srs)

# Compare respondents' mean to the full sample mean ----

t_test_resp_vs_full(survey_design = survey_design,
                    y_vars = c("STUDENT_AGE", "WHETHER_PARENT_AGREES"),
                    status = 'RESPONSE_STATUS',
                    status_codes = c('ER' = "Respondent",
                                     'EN' = "Nonrespondent",
                                     'IE' = "Ineligible",
                                     'UE' = "Unknown"))

# Compare respondents' mean to the mean of all eligible cases ----

t_test_resp_vs_full(survey_design = survey_design,
                    y_vars = c("STUDENT_AGE", "WHETHER_PARENT_AGREES"),
                    status = 'RESPONSE_STATUS',
                    status_codes = c('ER' = "Respondent",
                                     'EN' = "Nonrespondent",
                                     'IE' = "Ineligible",
                                     'UE' = "Unknown"))
# One-sided tests ----

  ## Null Hypothesis: Y_bar_resp - Y_bar_full <= 0.1
  ## Alt. Hypothesis: Y_bar_resp - Y_bar_full >  0.1

t_test_resp_vs_full(survey_design = survey_design,
                    y_vars = c("STUDENT_AGE", "WHETHER_PARENT_AGREES"),
                    status = 'RESPONSE_STATUS',
                    status_codes = c('ER' = "Respondent",
                                     'EN' = "Nonrespondent",
                                     'IE' = "Ineligible",
                                     'UE' = "Unknown"),
                    null_difference = 0.1, alternative = 'greater')

  ## Null Hypothesis: Y_bar_resp - Y_bar_full >= 0.1
  ## Alt. Hypothesis: Y_bar_resp - Y_bar_full <  0.1

t_test_resp_vs_full(survey_design = survey_design,
                    y_vars = c("STUDENT_AGE", "WHETHER_PARENT_AGREES"),
                    status = 'RESPONSE_STATUS',
                    status_codes = c('ER' = "Respondent",
                                     'EN' = "Nonrespondent",
                                     'IE' = "Ineligible",
                                     'UE' = "Unknown"),
                    null_difference = 0.1, alternative = 'less')

[Package nrba version 0.3.1 Index]