prop_diff {quest}R Documentation

Proportion Difference for a Single Variable across Two Independent Groups (Chi-square Test of Independence)

Description

prop_diff tests for proportion differences across two independent groups with a chi-square test of independence. The function also calculates the descriptive statistics for each group, various standardized effect sizes (e.g., Cramer's V), and can provide the 2x2 contingency tables. prop_diff is simply a wrapper for prop.test plus some extra calculations.

Usage

prop_diff(
  x,
  bin,
  lvl = levels(as.factor(bin)),
  yates = TRUE,
  zero.cell = 0.05,
  smooth = TRUE,
  ci.level = 0.95,
  rtn.table = TRUE,
  check = TRUE
)

Arguments

x

numeric vector that only has values of 0 or 1 (or missing values), otherwise known as a dummy variable.

bin

atomic vector that only takes on two values (or missing values), otherwise known as a binary variable.

lvl

character vector with length 2 specifying the unique values for the two groups. If bin is a factor, then lvl should be the factor levels rather than the underlying integer codes. This argument allows you to specify the direction of the prop difference. prop_diff calculates the prop difference as x[ bin == lvl[2] ] - x[ bin == lvl[1] ] such that it is group 2 - group 1. By changing which group is group 1 vs. group 2, the direction of the prop difference can be changed. See details.

yates

logical vector of length 1 specifying whether the Yate's continuity correction should be applied for small samples. See chisq.test for details.

zero.cell

numeric vector of length 1 specifying what value to impute for zero cell counts in the 2x2 contingency table when computing the tetrachoric correlation. See tetrachoric for details.

smooth

logical vector of length 1 specifying whether a smoothing algorithm should be applied when estimating the tetrachoric correlation. See tetrachoric for details.

ci.level

numeric vector of length 1 specifying the confidence level. ci.level must range from 0 to 1.

rtn.table

logical vector of lengh 1 specifying whether the return object should include the 2x2 contingency table of counts with totals and the 2x2 overall percentages table. If TRUE, then the last two elements of the return object are "count" containing a matrix of counts and "percent" containing a matrix of overall percentages.

check

logical vector of length 1 specifying whether the input arguments should be checked for errors. For example, if bin has more than 2 unique values (other than missing values) or if bin has length different than the length of x. This is a tradeoff between computational efficiency (FALSE) and more useful error messages (TRUE).

Value

list of numeric vectors containing statistical information about the mean difference: 1) nhst = chi-square test of independence stat info in a numeric vector, 2) desc = descriptive statistics stat info in a numeric vector, 3) std = various standardized effect sizes in a numeric vector, 4) count = numeric matrix with dim = [3, 3] of the 2x2 contingency table of counts with an additional row and column for totals (if rtn.table = TRUE), 5) percent = numeric matrix with dim = [3, 3] of the 2x2 contingency table of overall percentages with an additional row and column for totals (if rtn.table = TRUE)

1) nhst = chi-square test of independence stat info in a numeric vector

est

mean difference estimate (i.e., group 2 - group 1)

se

NA (to remind the user there is no standard error for the test)

X2

chi-square value

df

degrees of freedom (will always be 1)

p

two-sided p-value

lwr

lower bound of the confidence interval

upr

upper bound of the confidence interval

2) desc = descriptive statistics stat info in a numeric vector

prop_'lvl[2]'

proportion of group 2

prop_'lvl[1]'

proportion of group 1

sd_'lvl[2]'

standard deviation of group 2

sd_'lvl[1]'

standard deviation of group 1

n_'lvl[2]'

sample size of group 2

n_'lvl[1]'

sample size of group 1

3) std = various standardized effect sizes in a numeric vector

cramer

Cramer's V estimate

h

Cohen's h estimate

phi

Phi coefficient estimate

yule

Yule coefficient estimate

tetra

Tetrachoric correlation estimate

OR

odds ratio estimate

RR

risk ratio estimate calculated as (i.e., group 2 / group 1). Note this value will often differ when recoding variables (as it should).

4) count = numeric matrix with dim = [3, 3] of the 2x2 contingency table of counts with an additional row and column for totals (if rtn.table = TRUE).

The two unique observed values of x (i.e., 0 and 1) - plus the total - are the rows and the two unique observed values of bin - plus the total - are the columns. The dimlabels are "bin" for the rows and "x" for the columns. The rownames are 1. "0", 2. "1", 3. "total". The colnames are 1. 'lvl[1]', 2. 'lvl[2]', 3. "total"

5) percent = numeric matrix with dim = [3, 3] of the 2x2 contingency table of overall percentages with an additional row and column for totals (if rtn.table = TRUE).

The two unique observed values of x (i.e., 0 and 1) - plus the total - are the rows and the two unique observed values of bin - plus the total - are the columns. The dimlabels are "bin" for the rows and "x" for the columns. The rownames are 1. "0", 2. "1", 3. "total". The colnames are 1. 'lvl[1]', 2. 'lvl[2]', 3. "total"

See Also

prop.test the workhorse for prop_diff, props_diff for multiple dummy variables, phi for another phi coefficient function Yule for another yule coefficient function tetrachoric for another tetrachoric coefficient function

Examples


# chi-square test of independence
# x = "am", bin = "vs"
mtcars2 <- mtcars
mtcars2$"vs_bin" <- ifelse(mtcars$"vs" == 1, yes = "yes", no = "no")
agg(mtcars2$"am", grp = mtcars2$"vs_bin", rep = FALSE, fun = mean)
prop_diff(x = mtcars2$"am", bin = mtcars2$"vs_bin")
prop_diff(x = mtcars2$"am", bin = mtcars2$"vs")

# using \code{lvl} argument
prop_diff(x = mtcars2$"am", bin = mtcars2$"vs_bin")
prop_diff(x = mtcars2$"am", bin = mtcars2$"vs_bin",
   lvl = c("yes","no")) # reverses the direction of the effect
prop_diff(x = mtcars2$"am", bin = mtcars2$"vs",
   lvl = c(1, 0)) # levels don't have to be character

# recoding the variables
prop_diff(x = mtcars2$"am", bin = ifelse(mtcars2$"vs_bin" == "yes",
   yes = "no", no = "yes")) # reverses the direction of the effect
prop_diff(x = ifelse(mtcars2$"am" == 1, yes = 0, no = 1),
   bin = mtcars2$"vs") # reverses the direction of the effect
prop_diff(x = ifelse(mtcars2$"am" == 1, yes = 0, no = 1),
   bin = ifelse(mtcars2$"vs_bin" == "yes",
      yes = "no", no = "yes")) # double reverse means same direction of the effect

# compare to stats::prop.test
# x = "am", bin = "vs_bin" (binary as the rows; dummy as the columns)
tmp <- c("vs_bin","am") # b/c Roxygen2 will cause problems
table_obj <- table(mtcars2[tmp])
row_order <- nrow(table_obj):1
col_order <- ncol(table_obj):1
table_obj4prop <- table_obj[row_order, col_order]
prop.test(table_obj4prop)

# compare to stats:chisq.test
chisq.test(x = mtcars2$"am", y = mtcars2$"vs_bin")

# compare to psych::phi
cor(mtcars2$"am", mtcars$"vs")
psych::phi(table_obj, digits = 7)

# compare to psych::yule()
psych::Yule(table_obj)

# compare to psych::tetrachoric
psych::tetrachoric(table_obj)
# Note, I couldn't find a case where psych::tetrachoric() failed to compute
psych::tetrachoric(table_obj4prop)

# different than single logistic regression
summary(glm(am ~ vs, data = mtcars, family = binomial(link = "logit")))


[Package quest version 0.2.0 Index]