props_diff {quest}R Documentation

Proportion Difference of Multiple Variables Across Two Independent Groups (Chi-square Tests of Independence)

Description

props_diff tests the proportion difference of multiple variables across two independent groups with chi-square tests of independence. The function also calculates the descriptive statistics for each group, various standardized effect sizes (e.g., Cramer's V), and can provide the 2x2 contingency tables. props_diff is simply a wrapper for prop.test plus some extra calculations.

Usage

props_diff(
  data,
  vrb.nm,
  bin.nm,
  lvl = levels(as.factor(data[[bin.nm]])),
  yates = TRUE,
  zero.cell = 0.05,
  smooth = TRUE,
  ci.level = 0.95,
  rtn.table = TRUE,
  check = TRUE
)

Arguments

data

data.frame of data.

vrb.nm

character vector specifying the colnames in data for the variables. Since we are testing proportions, the variables must be dummy codes such that they only have values of 0 or 1 (or missing values).

bin.nm

character vector of length 1 specifying the colname in data for the binary variable that only takes on two values (or missing values), specifying the two independent groups.

lvl

character vector with length 2 specifying the unique values for the two groups. If bin is a factor, then lvl should be the factor levels rather than the underlying integer codes. This argument allows you to specify the direction of the prop difference. prop_diff calculates the prop differences as x[ bin == lvl[2] ] - x[ bin == lvl[1] ] such that it is group 2 - group 1. By changing which group is group 1 vs. group 2, the direction of the prop differences can be changed. See details of prop_diff.

yates

logical vector of length 1 specifying whether the Yate's continuity correction should be applied for small samples. See chisq.test for details.

zero.cell

numeric vector of length 1 specifying what value to impute for zero cell counts in the 2x2 contingency table when computing the tetrachoric correlations. See tetrachoric for details.

smooth

logical vector of length 1 specifying whether a smoothing algorithm should be applied when estimating the tetrachoric correlations. See tetrachoric for details.

ci.level

numeric vector of length 1 specifying the confidence level. ci.level must range from 0 to 1.

rtn.table

logical vector of lengh 1 specifying whether the return object should include the 2x2 contingency table of counts with totals and the 2x2 overall percentages table. If TRUE, then the last two elements of the return object are "count" containing a 3D array of counts and "percent" containing a 3D array of overall percentages.

check

logical vector of length 1 specifying whether the input arguments should be checked for errors. For example, if data[[bin.nm]] has more than 2 unique values (other than missing values). This is a tradeoff between computational efficiency (FALSE) and more useful error messages (TRUE).

Value

list of data.frames containing statistical information about the prop differences (the rownames of each data.frame are vrb.nm): 1) chisqtest = chi-square tests of independence stat info in a data.frame, 2) describes = descriptive statistics stat info in a data.frame, 3) effects = various standardized effect sizes in a data.frame, 4) count = numeric 3D array with dim = [3, 3, length(vrb.nm)] of the 2x2 contingency tables of counts with additional rows and columns for totals (if rtn.table = TRUE), 5) percent = numeric 3D array with dim = [3, 3, length(vrb.nm)] of the 2x2 contingency tables of overall percentages with additional rows and columns for totals (if rtn.table = TRUE).

1) chisqtest = chi-square tests of independence stat info in a data.frame

est

mean difference estimate (i.e., group 2 - group 1)

se

NA (to remind the user there is no standard error for the test)

X2

chi-square value

df

degrees of freedom (will always be 1)

p

two-sided p-value

lwr

lower bound of the confidence interval

upr

upper bound of the confidence interval

2) describes = descriptive statistics stat info in a data.frame

prop_'lvl[2]'

proportion of group 2

prop_'lvl[1]'

proportion of group 1

sd_'lvl[2]'

standard deviation of group 2

sd_'lvl[1]'

standard deviation of group 1

n_'lvl[2]'

sample size of group 2

n_'lvl[1]'

sample size of group 1

3) effects = various standardized effect sizes in a data.frame

cramer

Cramer's V estimate

h

Cohen's h estimate

phi

Phi coefficient estimate

yule

Yule coefficient estimate

tetra

Tetrachoric correlation estimate

OR

odds ratio estimate

RR

risk ratio estimate calculated as (i.e., group 2 / group 1). Note this value will often differ when recoding variables (as it should).

4) count = numeric 3D array with dim = [3, 3, length(vrb.nm)] of the 2x2 contingency tables of counts with additional rows and columns for totals (if rtn.table = TRUE).

The two unique observed values of data[vrb.nm] (i.e., 0 and 1) - plus the total - are the rows and the two unique observed values of data[[bin.nm]] - plus the total - are the columns. The variables themselves as the layers (i.e., 3rd dimension of the array). The dimlabels are "bin" for the rows, "x" for the columns, and "vrb" for the layers. The rownames are 1. "0", 2. "1", 3. "total". The colnames are 1. 'lvl[1]', 2. 'lvl[2]', 3. "total". The laynames are vrb.nm.

5) percent = numeric 3D array with dim = [3, 3, length(vrb.nm)] of the 2x2 contingency tables of overall percentages with additional rows and columns for totals (if rtn.table = TRUE).

The two unique observed values of data[vrb.nm] (i.e., 0 and 1) - plus the total - are the rows and the two unique observed values of data[[bin]] - plus the total - are the columns. The variables themselves as the layers (i.e., 3rd dimension of the array). The dimlabels are "bin" for the rows, "x" for the columns, and "vrb" for the layers. The rownames are 1. "0", 2. "1", 3. "total". The colnames are 1. 'lvl[1]', 2. 'lvl[2]', 3. "total". The laynames are vrb.nm.

See Also

prop.test the workhorse for props_diff, prop_diff for a single dummy variable, phi for another phi coefficient function Yule for another yule coefficient function tetrachoric for another tetrachoric coefficient function

Examples


# rtn.table = TRUE (default)

# multiple variables
mtcars2 <- mtcars
mtcars2$"vs_bin" <- ifelse(mtcars$"vs" == 1, yes = "yes", no = "no")
mtcars2$"gear_dum" <- ifelse(mtcars2$"gear" > 3, yes = 1L, no = 0L)
mtcars2$"carb_dum" <- ifelse(mtcars2$"carb" > 3, yes = 1L, no = 0L)
vrb_nm <- c("am","gear_dum","carb_dum") # dummy variables
lapply(X = vrb_nm, FUN = function(nm) {
   tmp <- c("vs_bin", nm)
   table(mtcars2[tmp])
})
props_diff(data = mtcars2, vrb.nm = c("am","gear_dum","carb_dum"), bin.nm = "vs_bin")

# single variable
props_diff(mtcars2, vrb.nm = "am", bin.nm = "vs_bin")

# rtn.table = FALSE (no "count" or "percent" list elements)

# multiple variables
props_diff(data = mtcars2, vrb.nm = c("am","gear_dum","carb_dum"), bin.nm = "vs",
   rtn.table = FALSE)

# single variable
props_diff(mtcars, vrb.nm = "am", bin.nm = "vs",
   rtn.table = FALSE)


[Package quest version 0.2.0 Index]