R: Proportion Comparisons for a Single Variable across 3+...

prop_compare {quest}

R Documentation

Proportion Comparisons for a Single Variable across 3+ Independent Groups (Chi-square Test of Independence)

Description

prop_compare tests for proportion differences across 3+ independent groups with a chi-square test of independence. The function also calculates the descriptive statistics for each group, Cramer's V and its confidence interval as a standardized effect size, and can provide the X by 2 contingency tables. prop_compare is simply a wrapper for prop.test plus some extra calculations.

Usage

prop_compare(
  x,
  nom,
  lvl = levels(as.factor(nom)),
  yates = TRUE,
  ci.level = 0.95,
  rtn.table = TRUE,
  check = TRUE
)

Arguments

`x`	numeric vector that only has values of 0 or 1 (or missing values), otherwise known as a dummy variable.
`nom`	atomic vector that takes on three or more unordered values (or missing values), otherwise known as a nominal variable.
`lvl`	character vector with length 2 specifying the unique values for the two groups. If `nom` is a factor, then `lvl` should be the factor levels rather than the underlying integer codes. This argument allows you to specify order of the proportions in the return object.
`yates`	logical vector of length 1 specifying whether the Yate's continuity correction should be applied for small samples. See `chisq.test` for details.
`ci.level`	numeric vector of length 1 specifying the confidence level. `ci.level` must range from 0 to 1.
`rtn.table`	logical vector of lengh 1 specifying whether the return object should include the X by 2 contingency table of counts with totals and the X by 2 overall percentages table. If TRUE, then the last two elements of the return object are "count" containing a matrix of counts and "percent" containing a matrix of overall percentages.
`check`	logical vector of length 1 specifying whether the input arguments should be checked for errors. For example, if `nom` has length different than the length of `x`. This is a tradeoff between computational efficiency (FALSE) and more useful error messages (TRUE).

Details

The confidence interval for Cramer's V is calculated with fisher's r to z transformation as Cramer's V is a kind of multiple correlation coefficient. Cramer's V is transformed to fisher's z units, a symmetric confidence interval for fisher's z is calculated, and then the lower and upper bounds are back-transformed to Cramer's V units.

Value

list of numeric vectors containing statistical information about the proportion comparisons: 1) nhst = chi-square test of independence stat info in a numeric vector, 2) desc = descriptive statistics stat info in a numeric vector, 3) std = standardized effect size and its confidence interval in a numeric vector, 4) count = numeric matrix with dim = [X+1, 3] of the X by 2 contingency table of counts with an additional row and column for totals (if rtn.table = TRUE), 5) percent = numeric matrix with dim = [X+1, 3] of the X by 2 contingency table of overall percentages with an additional row and column for totals (if rtn.table = TRUE).

1) nhst = chi-square test of independence stat info in a numeric vector

est: average proportion difference absolute value (i.e., |group j - group i|)
se: NA (to remind the user there is no standard error for the test)
X2: chi-square value
df: degrees of freedom (of the nominal variable)
p: two-sided p-value

2) desc = descriptive statistics stat info in a numeric vector (note there could be more than 3 groups - groups i, j, and k are just provided as an example):

prop_'lvl[k]': proportion of group k
prop_'lvl[j]': proportion of group j
prop_'lvl[i]': proportion of group i
sd_'lvl[k]': standard deviation of group k
sd_'lvl[j]': standard deviation of group j
sd_'lvl[i]': standard deviation of group i
n_'lvl[k]': sample size of group k
n_'lvl[j]': sample size of group j
n_'lvl[i]': sample size of group i

3) std = standardized effect size and its confidence interval in a numeric vector

cramer: Cramer's V estimate
lwr: lower bound of Cramer's V confidence interval
upr: upper bound of Cramer's V confidence interval

4) count = numeric matrix with dim = [X+1, 3] of the X by 2 contingency table of counts with an additional row and column for totals (if rtn.table = TRUE).

The 3+ unique observed values of nom - plus the total - are the rows and the two unique observed values of x (i.e., 0 and 1) - plus the total - are the columns. The dimlabels are "nom" for the rows and "x" for the columns. The rownames are 1. 'lvl[i]', 2. 'lvl[j]', 3. 'lvl[k]', 4. "total". The colnames are 1. "0", 2. "1", 3. "total".

5) percent = numeric matrix with dim = [X+1, 3] of the X by 2 contingency table of overall percentages with an additional row and column for totals (if rtn.table = TRUE).

Examples


tmp <- replicate(n = 10, expr = mtcars, simplify = FALSE)
mtcars2 <- str2str::ld2d(tmp)
mtcars2$"cyl_fct" <- car::recode(mtcars2$"cyl",
   recodes = "4='four'; 6='six'; 8='eight'", as.factor = TRUE)
prop_compare(x = mtcars2$"am", nom = mtcars2$"cyl_fct")
prop_compare(x = mtcars2$"am", nom = mtcars2$"cyl_fct",
   lvl = c("four","six","eight")) # specify order of levels in return object

# more than 3 groups
prop_compare(x = ifelse(airquality$"Wind" >= 10, yes = 1, no = 0), nom = airquality$"Month")
prop_compare(x = ifelse(airquality$"Wind" >= 10, yes = 1, no = 0), nom = airquality$"Month",
   rtn.table = FALSE) # no contingency tables

[Package quest version 0.2.0 Index]