R: Proportion Comparisons for Multiple Variables across 3+...

props_compare {quest}

R Documentation

Proportion Comparisons for Multiple Variables across 3+ Independent Groups (Chi-square Tests of Independence)

Description

prop_compare tests for proportion differences across 3+ independent groups with chi-square tests of independence. The function also calculates the descriptive statistics for each group, Cramer's V and its confidence interval as a standardized effect size, and can provide the X by 2 contingency tables. prop_compare is simply a wrapper for prop.test plus some extra calculations.

Usage

props_compare(
  data,
  vrb.nm,
  nom.nm,
  lvl = levels(as.factor(data[[nom.nm]])),
  yates = TRUE,
  ci.level = 0.95,
  rtn.table = TRUE,
  check = TRUE
)

Arguments

`data`	data.frame of data.
`vrb.nm`	character vector of colnames from `data` specifying the dummy variables, in other words, variables that only have values of 0 or 1 (or missing values).
`nom.nm`	character vector of length 1 specifying the colname in `data` containing a nominal variable that takes on three or more unordered values (or missing values).
`lvl`	character vector with length 3+ specifying the unique values for the 3+ independent groups. If `nom` is a factor, then `lvl` should be the factor levels rather than the underlying integer codes. This argument allows you to specify order of the proportions in the return object.
`yates`	logical vector of length 1 specifying whether the Yate's continuity correction should be applied for small samples. See `chisq.test` for details.
`ci.level`	numeric vector of length 1 specifying the confidence level. `ci.level` must range from 0 to 1.
`rtn.table`	logical vector of lengh 1 specifying whether the return object should include the X by 2 contingency table of counts with totals for each dummy variable and the X by 2 overall percentages table with totals for each dummy variable. If TRUE, then the last two elements of the return object are "count" containing an array of counts and "percent" containing an array of overall percentages.
`check`	logical vector of length 1 specifying whether the input arguments should be checked for errors. For example, if `lvl` has values that are not present in `data[[nom.nm]]`. This is a tradeoff between computational efficiency (FALSE) and more useful error messages (TRUE).

Details

The confidence interval for Cramer's V is calculated with fisher's r to z transformation as Cramer's V is a kind of multiple correlation coefficient. Cramer's V is transformed to fisher's z units, a symmetric confidence interval for fisher's z is calculated, and then the lower and upper bounds are back-transformed to Cramer's V units.

Value

list of data.frames containing statistical information about the proportion comparisons: 1) nhst = chi-square test of independence stat info in a data.frame, 2) desc = descriptive statistics stat info in a data.frame (note there could be more than 3 groups - groups i, j, and k are just provided as an example), 3) std = standardized effect size and its confidence interval in a data.frame, 4) count = numeric array with dim = [X+1, 3, length(vrb.nm)] of the X by 2 contingency table of counts for each dummy variable with an additional row and column for totals (if rtn.table = TRUE), 5) percent = numeric array with dim = [X+1, 3, length(vrb.nm)] of the X by 2 contingency table of overall percentages for each dummy variable with an additional row and column for totals (if rtn.table = TRUE).

1) nhst = chi-square test of independence stat info in a data.frame

est: average proportion difference absolute value (i.e., |group j - group i|)
se: NA (to remind the user there is no standard error for the test)
X2: chi-square value
df: degrees of freedom (of the nominal variable)
p: two-sided p-value

2) desc = descriptive statistics stat info in a data.frame (note there could be more than 3 groups - groups i, j, and k are just provided as an example):

prop_'lvl[k]': proportion of group k
prop_'lvl[j]': proportion of group j
prop_'lvl[i]': proportion of group i
sd_'lvl[k]': standard deviation of group k
sd_'lvl[j]': standard deviation of group j
sd_'lvl[i]': standard deviation of group i
n_'lvl[k]': sample size of group k
n_'lvl[j]': sample size of group j
n_'lvl[i]': sample size of group i

3) std = standardized effect size and its confidence interval in a data.frame

cramer: Cramer's V estimate
lwr: lower bound of Cramer's V confidence interval
upr: upper bound of Cramer's V confidence interval

4) count = numeric array with dim = [X+1, 3, length(vrb.nm)] of the X by 2 contingency table of counts for each dummy variable with an additional row and column for totals (if rtn.table = TRUE).

The 3+ unique observed values of data[[nom.nm]] - plus the total - are the rows and the two unique observed values of data[[vrb.nm]] (i.e., 0 and 1) - plus the total - are the columns. The variables in data[vrb.nm] are the layers. The dimlabels are "nom" for the rows and "x" for the columns and "vrb" for the layers. The rownames are 1. 'lvl[i]', 2. 'lvl[j]', 3. 'lvl[k]', 4. "total". The colnames are 1. "0", 2. "1", 3. "total". The laynames are vrb.nm.

5) percent = numeric array with dim = [X+1, 3, length(vrb.nm)] of the X by 2 contingency table of overall percentages for each dummy variable with an additional row and column for totals (if rtn.table = TRUE).

The 3+ unique observed values of data[[nom.nm]] - plus the total - are the rows and the two unique observed values of data[[vrb.nm]] (i.e., 0 and 1) - plus the total - are the columns. The variables in data[vrb.nm] are the layers. The dimlabels are "nom" for the rows, "x" for the columns, and "vrb" for the layers. The rownames are 1. 'lvl[i]', 2. 'lvl[j]', 3. 'lvl[k]', 4. "total". The colnames are 1. "0", 2. "1", 3. "total". The laynames are vrb.nm.

Examples


# rtn.table = TRUE (default)

# multiple variables
tmp <- replicate(n = 10, expr = mtcars, simplify = FALSE)
mtcars2 <- str2str::ld2d(tmp)
mtcars2$"gear_dum" <- ifelse(mtcars2$"gear" > 3, yes = 1L, no = 0L)
mtcars2$"carb_dum" <- ifelse(mtcars2$"carb" > 3, yes = 1L, no = 0L)
vrb_nm <- c("am","gear_dum","carb_dum") # dummy variables
lapply(X = vrb_nm, FUN = function(nm) {
   tmp <- c("cyl", nm)
   table(mtcars2[tmp])
})
props_compare(data = mtcars2, vrb.nm = c("am","gear_dum","carb_dum"), nom.nm = "cyl")

# single variable
props_compare(mtcars2, vrb.nm = "am", nom.nm = "cyl")

# rtn.table = FALSE (no "count" or "percent" list elements)

# multiple variables
props_compare(data = mtcars2, vrb.nm = c("am","gear_dum","carb_dum"), nom.nm = "cyl",
   rtn.table = FALSE)

# single variable
props_compare(mtcars2, vrb.nm = "am", nom.nm = "cyl",
   rtn.table = FALSE)

# more than 3 groups
airquality2 <- airquality
airquality2$"Wind_dum" <- ifelse(airquality$"Wind" >= 10, yes = 1, no = 0)
airquality2$"Solar.R_dum" <- ifelse(airquality$"Solar.R" >= 100, yes = 1, no = 0)
props_compare(airquality2, vrb.nm = c("Wind_dum","Solar.R_dum"), nom.nm = "Month")
props_compare(airquality2, vrb.nm = "Wind_dum", nom.nm = "Month")

[Package quest version 0.2.0 Index]