R: Proportion Difference for a Single Variable across Two...

prop_diff {quest}

R Documentation

Proportion Difference for a Single Variable across Two Independent Groups (Chi-square Test of Independence)

Description

prop_diff tests for proportion differences across two independent groups with a chi-square test of independence. The function also calculates the descriptive statistics for each group, various standardized effect sizes (e.g., Cramer's V), and can provide the 2x2 contingency tables. prop_diff is simply a wrapper for prop.test plus some extra calculations.

Usage

prop_diff(
  x,
  bin,
  lvl = levels(as.factor(bin)),
  yates = TRUE,
  zero.cell = 0.05,
  smooth = TRUE,
  ci.level = 0.95,
  rtn.table = TRUE,
  check = TRUE
)

Arguments

`x`	numeric vector that only has values of 0 or 1 (or missing values), otherwise known as a dummy variable.
`bin`	atomic vector that only takes on two values (or missing values), otherwise known as a binary variable.
`lvl`	character vector with length 2 specifying the unique values for the two groups. If `bin` is a factor, then `lvl` should be the factor levels rather than the underlying integer codes. This argument allows you to specify the direction of the prop difference. `prop_diff` calculates the prop difference as `x[ bin == lvl[2] ]` - `x[ bin == lvl[1] ]` such that it is group 2 - group 1. By changing which group is group 1 vs. group 2, the direction of the prop difference can be changed. See details.
`yates`	logical vector of length 1 specifying whether the Yate's continuity correction should be applied for small samples. See `chisq.test` for details.
`zero.cell`	numeric vector of length 1 specifying what value to impute for zero cell counts in the 2x2 contingency table when computing the tetrachoric correlation. See `tetrachoric` for details.
`smooth`	logical vector of length 1 specifying whether a smoothing algorithm should be applied when estimating the tetrachoric correlation. See `tetrachoric` for details.
`ci.level`	numeric vector of length 1 specifying the confidence level. `ci.level` must range from 0 to 1.
`rtn.table`	logical vector of lengh 1 specifying whether the return object should include the 2x2 contingency table of counts with totals and the 2x2 overall percentages table. If TRUE, then the last two elements of the return object are "count" containing a matrix of counts and "percent" containing a matrix of overall percentages.
`check`	logical vector of length 1 specifying whether the input arguments should be checked for errors. For example, if `bin` has more than 2 unique values (other than missing values) or if `bin` has length different than the length of `x`. This is a tradeoff between computational efficiency (FALSE) and more useful error messages (TRUE).

Value

list of numeric vectors containing statistical information about the mean difference: 1) nhst = chi-square test of independence stat info in a numeric vector, 2) desc = descriptive statistics stat info in a numeric vector, 3) std = various standardized effect sizes in a numeric vector, 4) count = numeric matrix with dim = [3, 3] of the 2x2 contingency table of counts with an additional row and column for totals (if rtn.table = TRUE), 5) percent = numeric matrix with dim = [3, 3] of the 2x2 contingency table of overall percentages with an additional row and column for totals (if rtn.table = TRUE)

1) nhst = chi-square test of independence stat info in a numeric vector

est: mean difference estimate (i.e., group 2 - group 1)
se: NA (to remind the user there is no standard error for the test)
X2: chi-square value
df: degrees of freedom (will always be 1)
p: two-sided p-value
lwr: lower bound of the confidence interval
upr: upper bound of the confidence interval

2) desc = descriptive statistics stat info in a numeric vector

prop_'lvl[2]': proportion of group 2
prop_'lvl[1]': proportion of group 1
sd_'lvl[2]': standard deviation of group 2
sd_'lvl[1]': standard deviation of group 1
n_'lvl[2]': sample size of group 2
n_'lvl[1]': sample size of group 1

3) std = various standardized effect sizes in a numeric vector

cramer: Cramer's V estimate
h: Cohen's h estimate
phi: Phi coefficient estimate
yule: Yule coefficient estimate
tetra: Tetrachoric correlation estimate
OR: odds ratio estimate
RR: risk ratio estimate calculated as (i.e., group 2 / group 1). Note this value will often differ when recoding variables (as it should).

4) count = numeric matrix with dim = [3, 3] of the 2x2 contingency table of counts with an additional row and column for totals (if rtn.table = TRUE).

The two unique observed values of x (i.e., 0 and 1) - plus the total - are the rows and the two unique observed values of bin - plus the total - are the columns. The dimlabels are "bin" for the rows and "x" for the columns. The rownames are 1. "0", 2. "1", 3. "total". The colnames are 1. 'lvl[1]', 2. 'lvl[2]', 3. "total"

5) percent = numeric matrix with dim = [3, 3] of the 2x2 contingency table of overall percentages with an additional row and column for totals (if rtn.table = TRUE).

Examples


# chi-square test of independence
# x = "am", bin = "vs"
mtcars2 <- mtcars
mtcars2$"vs_bin" <- ifelse(mtcars$"vs" == 1, yes = "yes", no = "no")
agg(mtcars2$"am", grp = mtcars2$"vs_bin", rep = FALSE, fun = mean)
prop_diff(x = mtcars2$"am", bin = mtcars2$"vs_bin")
prop_diff(x = mtcars2$"am", bin = mtcars2$"vs")

# using \code{lvl} argument
prop_diff(x = mtcars2$"am", bin = mtcars2$"vs_bin")
prop_diff(x = mtcars2$"am", bin = mtcars2$"vs_bin",
   lvl = c("yes","no")) # reverses the direction of the effect
prop_diff(x = mtcars2$"am", bin = mtcars2$"vs",
   lvl = c(1, 0)) # levels don't have to be character

# recoding the variables
prop_diff(x = mtcars2$"am", bin = ifelse(mtcars2$"vs_bin" == "yes",
   yes = "no", no = "yes")) # reverses the direction of the effect
prop_diff(x = ifelse(mtcars2$"am" == 1, yes = 0, no = 1),
   bin = mtcars2$"vs") # reverses the direction of the effect
prop_diff(x = ifelse(mtcars2$"am" == 1, yes = 0, no = 1),
   bin = ifelse(mtcars2$"vs_bin" == "yes",
      yes = "no", no = "yes")) # double reverse means same direction of the effect

# compare to stats::prop.test
# x = "am", bin = "vs_bin" (binary as the rows; dummy as the columns)
tmp <- c("vs_bin","am") # b/c Roxygen2 will cause problems
table_obj <- table(mtcars2[tmp])
row_order <- nrow(table_obj):1
col_order <- ncol(table_obj):1
table_obj4prop <- table_obj[row_order, col_order]
prop.test(table_obj4prop)

# compare to stats:chisq.test
chisq.test(x = mtcars2$"am", y = mtcars2$"vs_bin")

# compare to psych::phi
cor(mtcars2$"am", mtcars$"vs")
psych::phi(table_obj, digits = 7)

# compare to psych::yule()
psych::Yule(table_obj)

# compare to psych::tetrachoric
psych::tetrachoric(table_obj)
# Note, I couldn't find a case where psych::tetrachoric() failed to compute
psych::tetrachoric(table_obj4prop)

# different than single logistic regression
summary(glm(am ~ vs, data = mtcars, family = binomial(link = "logit")))