props_compare {quest}R Documentation

Proportion Comparisons for Multiple Variables across 3+ Independent Groups (Chi-square Tests of Independence)

Description

prop_compare tests for proportion differences across 3+ independent groups with chi-square tests of independence. The function also calculates the descriptive statistics for each group, Cramer's V and its confidence interval as a standardized effect size, and can provide the X by 2 contingency tables. prop_compare is simply a wrapper for prop.test plus some extra calculations.

Usage

props_compare(
  data,
  vrb.nm,
  nom.nm,
  lvl = levels(as.factor(data[[nom.nm]])),
  yates = TRUE,
  ci.level = 0.95,
  rtn.table = TRUE,
  check = TRUE
)

Arguments

data

data.frame of data.

vrb.nm

character vector of colnames from data specifying the dummy variables, in other words, variables that only have values of 0 or 1 (or missing values).

nom.nm

character vector of length 1 specifying the colname in data containing a nominal variable that takes on three or more unordered values (or missing values).

lvl

character vector with length 3+ specifying the unique values for the 3+ independent groups. If nom is a factor, then lvl should be the factor levels rather than the underlying integer codes. This argument allows you to specify order of the proportions in the return object.

yates

logical vector of length 1 specifying whether the Yate's continuity correction should be applied for small samples. See chisq.test for details.

ci.level

numeric vector of length 1 specifying the confidence level. ci.level must range from 0 to 1.

rtn.table

logical vector of lengh 1 specifying whether the return object should include the X by 2 contingency table of counts with totals for each dummy variable and the X by 2 overall percentages table with totals for each dummy variable. If TRUE, then the last two elements of the return object are "count" containing an array of counts and "percent" containing an array of overall percentages.

check

logical vector of length 1 specifying whether the input arguments should be checked for errors. For example, if lvl has values that are not present in data[[nom.nm]]. This is a tradeoff between computational efficiency (FALSE) and more useful error messages (TRUE).

Details

The confidence interval for Cramer's V is calculated with fisher's r to z transformation as Cramer's V is a kind of multiple correlation coefficient. Cramer's V is transformed to fisher's z units, a symmetric confidence interval for fisher's z is calculated, and then the lower and upper bounds are back-transformed to Cramer's V units.

Value

list of data.frames containing statistical information about the proportion comparisons: 1) nhst = chi-square test of independence stat info in a data.frame, 2) desc = descriptive statistics stat info in a data.frame (note there could be more than 3 groups - groups i, j, and k are just provided as an example), 3) std = standardized effect size and its confidence interval in a data.frame, 4) count = numeric array with dim = [X+1, 3, length(vrb.nm)] of the X by 2 contingency table of counts for each dummy variable with an additional row and column for totals (if rtn.table = TRUE), 5) percent = numeric array with dim = [X+1, 3, length(vrb.nm)] of the X by 2 contingency table of overall percentages for each dummy variable with an additional row and column for totals (if rtn.table = TRUE).

1) nhst = chi-square test of independence stat info in a data.frame

est

average proportion difference absolute value (i.e., |group j - group i|)

se

NA (to remind the user there is no standard error for the test)

X2

chi-square value

df

degrees of freedom (of the nominal variable)

p

two-sided p-value

2) desc = descriptive statistics stat info in a data.frame (note there could be more than 3 groups - groups i, j, and k are just provided as an example):

prop_'lvl[k]'

proportion of group k

prop_'lvl[j]'

proportion of group j

prop_'lvl[i]'

proportion of group i

sd_'lvl[k]'

standard deviation of group k

sd_'lvl[j]'

standard deviation of group j

sd_'lvl[i]'

standard deviation of group i

n_'lvl[k]'

sample size of group k

n_'lvl[j]'

sample size of group j

n_'lvl[i]'

sample size of group i

3) std = standardized effect size and its confidence interval in a data.frame

cramer

Cramer's V estimate

lwr

lower bound of Cramer's V confidence interval

upr

upper bound of Cramer's V confidence interval

4) count = numeric array with dim = [X+1, 3, length(vrb.nm)] of the X by 2 contingency table of counts for each dummy variable with an additional row and column for totals (if rtn.table = TRUE).

The 3+ unique observed values of data[[nom.nm]] - plus the total - are the rows and the two unique observed values of data[[vrb.nm]] (i.e., 0 and 1) - plus the total - are the columns. The variables in data[vrb.nm] are the layers. The dimlabels are "nom" for the rows and "x" for the columns and "vrb" for the layers. The rownames are 1. 'lvl[i]', 2. 'lvl[j]', 3. 'lvl[k]', 4. "total". The colnames are 1. "0", 2. "1", 3. "total". The laynames are vrb.nm.

5) percent = numeric array with dim = [X+1, 3, length(vrb.nm)] of the X by 2 contingency table of overall percentages for each dummy variable with an additional row and column for totals (if rtn.table = TRUE).

The 3+ unique observed values of data[[nom.nm]] - plus the total - are the rows and the two unique observed values of data[[vrb.nm]] (i.e., 0 and 1) - plus the total - are the columns. The variables in data[vrb.nm] are the layers. The dimlabels are "nom" for the rows, "x" for the columns, and "vrb" for the layers. The rownames are 1. 'lvl[i]', 2. 'lvl[j]', 3. 'lvl[k]', 4. "total". The colnames are 1. "0", 2. "1", 3. "total". The laynames are vrb.nm.

See Also

prop.test the workhorse for prop_compare, prop_compare for a single dummy variable, props_diff for only 2 independent groups (aka binary variable),

Examples


# rtn.table = TRUE (default)

# multiple variables
tmp <- replicate(n = 10, expr = mtcars, simplify = FALSE)
mtcars2 <- str2str::ld2d(tmp)
mtcars2$"gear_dum" <- ifelse(mtcars2$"gear" > 3, yes = 1L, no = 0L)
mtcars2$"carb_dum" <- ifelse(mtcars2$"carb" > 3, yes = 1L, no = 0L)
vrb_nm <- c("am","gear_dum","carb_dum") # dummy variables
lapply(X = vrb_nm, FUN = function(nm) {
   tmp <- c("cyl", nm)
   table(mtcars2[tmp])
})
props_compare(data = mtcars2, vrb.nm = c("am","gear_dum","carb_dum"), nom.nm = "cyl")

# single variable
props_compare(mtcars2, vrb.nm = "am", nom.nm = "cyl")

# rtn.table = FALSE (no "count" or "percent" list elements)

# multiple variables
props_compare(data = mtcars2, vrb.nm = c("am","gear_dum","carb_dum"), nom.nm = "cyl",
   rtn.table = FALSE)

# single variable
props_compare(mtcars2, vrb.nm = "am", nom.nm = "cyl",
   rtn.table = FALSE)

# more than 3 groups
airquality2 <- airquality
airquality2$"Wind_dum" <- ifelse(airquality$"Wind" >= 10, yes = 1, no = 0)
airquality2$"Solar.R_dum" <- ifelse(airquality$"Solar.R" >= 100, yes = 1, no = 0)
props_compare(airquality2, vrb.nm = c("Wind_dum","Solar.R_dum"), nom.nm = "Month")
props_compare(airquality2, vrb.nm = "Wind_dum", nom.nm = "Month")


[Package quest version 0.2.0 Index]