R: Proportion Difference of Multiple Variables Across Two...

props_diff {quest}

R Documentation

Proportion Difference of Multiple Variables Across Two Independent Groups (Chi-square Tests of Independence)

Description

props_diff tests the proportion difference of multiple variables across two independent groups with chi-square tests of independence. The function also calculates the descriptive statistics for each group, various standardized effect sizes (e.g., Cramer's V), and can provide the 2x2 contingency tables. props_diff is simply a wrapper for prop.test plus some extra calculations.

Usage

props_diff(
  data,
  vrb.nm,
  bin.nm,
  lvl = levels(as.factor(data[[bin.nm]])),
  yates = TRUE,
  zero.cell = 0.05,
  smooth = TRUE,
  ci.level = 0.95,
  rtn.table = TRUE,
  check = TRUE
)

Arguments

`data`	data.frame of data.
`vrb.nm`	character vector specifying the colnames in `data` for the variables. Since we are testing proportions, the variables must be dummy codes such that they only have values of 0 or 1 (or missing values).
`bin.nm`	character vector of length 1 specifying the colname in `data` for the binary variable that only takes on two values (or missing values), specifying the two independent groups.
`lvl`	character vector with length 2 specifying the unique values for the two groups. If `bin` is a factor, then `lvl` should be the factor levels rather than the underlying integer codes. This argument allows you to specify the direction of the prop difference. `prop_diff` calculates the prop differences as `x[ bin == lvl[2] ]` - `x[ bin == lvl[1] ]` such that it is group 2 - group 1. By changing which group is group 1 vs. group 2, the direction of the prop differences can be changed. See details of `prop_diff`.
`yates`	logical vector of length 1 specifying whether the Yate's continuity correction should be applied for small samples. See `chisq.test` for details.
`zero.cell`	numeric vector of length 1 specifying what value to impute for zero cell counts in the 2x2 contingency table when computing the tetrachoric correlations. See `tetrachoric` for details.
`smooth`	logical vector of length 1 specifying whether a smoothing algorithm should be applied when estimating the tetrachoric correlations. See `tetrachoric` for details.
`ci.level`	numeric vector of length 1 specifying the confidence level. `ci.level` must range from 0 to 1.
`rtn.table`	logical vector of lengh 1 specifying whether the return object should include the 2x2 contingency table of counts with totals and the 2x2 overall percentages table. If TRUE, then the last two elements of the return object are "count" containing a 3D array of counts and "percent" containing a 3D array of overall percentages.
`check`	logical vector of length 1 specifying whether the input arguments should be checked for errors. For example, if `data[[bin.nm]]` has more than 2 unique values (other than missing values). This is a tradeoff between computational efficiency (FALSE) and more useful error messages (TRUE).

Value

list of data.frames containing statistical information about the prop differences (the rownames of each data.frame are vrb.nm): 1) chisqtest = chi-square tests of independence stat info in a data.frame, 2) describes = descriptive statistics stat info in a data.frame, 3) effects = various standardized effect sizes in a data.frame, 4) count = numeric 3D array with dim = [3, 3, length(vrb.nm)] of the 2x2 contingency tables of counts with additional rows and columns for totals (if rtn.table = TRUE), 5) percent = numeric 3D array with dim = [3, 3, length(vrb.nm)] of the 2x2 contingency tables of overall percentages with additional rows and columns for totals (if rtn.table = TRUE).

1) chisqtest = chi-square tests of independence stat info in a data.frame

est: mean difference estimate (i.e., group 2 - group 1)
se: NA (to remind the user there is no standard error for the test)
X2: chi-square value
df: degrees of freedom (will always be 1)
p: two-sided p-value
lwr: lower bound of the confidence interval
upr: upper bound of the confidence interval

2) describes = descriptive statistics stat info in a data.frame

prop_'lvl[2]': proportion of group 2
prop_'lvl[1]': proportion of group 1
sd_'lvl[2]': standard deviation of group 2
sd_'lvl[1]': standard deviation of group 1
n_'lvl[2]': sample size of group 2
n_'lvl[1]': sample size of group 1

3) effects = various standardized effect sizes in a data.frame

cramer: Cramer's V estimate
h: Cohen's h estimate
phi: Phi coefficient estimate
yule: Yule coefficient estimate
tetra: Tetrachoric correlation estimate
OR: odds ratio estimate
RR: risk ratio estimate calculated as (i.e., group 2 / group 1). Note this value will often differ when recoding variables (as it should).

4) count = numeric 3D array with dim = [3, 3, length(vrb.nm)] of the 2x2 contingency tables of counts with additional rows and columns for totals (if rtn.table = TRUE).

The two unique observed values of data[vrb.nm] (i.e., 0 and 1) - plus the total - are the rows and the two unique observed values of data[[bin.nm]] - plus the total - are the columns. The variables themselves as the layers (i.e., 3rd dimension of the array). The dimlabels are "bin" for the rows, "x" for the columns, and "vrb" for the layers. The rownames are 1. "0", 2. "1", 3. "total". The colnames are 1. 'lvl[1]', 2. 'lvl[2]', 3. "total". The laynames are vrb.nm.

5) percent = numeric 3D array with dim = [3, 3, length(vrb.nm)] of the 2x2 contingency tables of overall percentages with additional rows and columns for totals (if rtn.table = TRUE).

The two unique observed values of data[vrb.nm] (i.e., 0 and 1) - plus the total - are the rows and the two unique observed values of data[[bin]] - plus the total - are the columns. The variables themselves as the layers (i.e., 3rd dimension of the array). The dimlabels are "bin" for the rows, "x" for the columns, and "vrb" for the layers. The rownames are 1. "0", 2. "1", 3. "total". The colnames are 1. 'lvl[1]', 2. 'lvl[2]', 3. "total". The laynames are vrb.nm.

Examples


# rtn.table = TRUE (default)

# multiple variables
mtcars2 <- mtcars
mtcars2$"vs_bin" <- ifelse(mtcars$"vs" == 1, yes = "yes", no = "no")
mtcars2$"gear_dum" <- ifelse(mtcars2$"gear" > 3, yes = 1L, no = 0L)
mtcars2$"carb_dum" <- ifelse(mtcars2$"carb" > 3, yes = 1L, no = 0L)
vrb_nm <- c("am","gear_dum","carb_dum") # dummy variables
lapply(X = vrb_nm, FUN = function(nm) {
   tmp <- c("vs_bin", nm)
   table(mtcars2[tmp])
})
props_diff(data = mtcars2, vrb.nm = c("am","gear_dum","carb_dum"), bin.nm = "vs_bin")

# single variable
props_diff(mtcars2, vrb.nm = "am", bin.nm = "vs_bin")

# rtn.table = FALSE (no "count" or "percent" list elements)

# multiple variables
props_diff(data = mtcars2, vrb.nm = c("am","gear_dum","carb_dum"), bin.nm = "vs",
   rtn.table = FALSE)

# single variable
props_diff(mtcars, vrb.nm = "am", bin.nm = "vs",
   rtn.table = FALSE)