props_compare {quest} | R Documentation |
Proportion Comparisons for Multiple Variables across 3+ Independent Groups (Chi-square Tests of Independence)
Description
prop_compare
tests for proportion differences across 3+ independent
groups with chi-square tests of independence. The function also calculates
the descriptive statistics for each group, Cramer's V and its confidence
interval as a standardized effect size, and can provide the X by 2
contingency tables. prop_compare
is simply a wrapper for
prop.test
plus some extra calculations.
Usage
props_compare(
data,
vrb.nm,
nom.nm,
lvl = levels(as.factor(data[[nom.nm]])),
yates = TRUE,
ci.level = 0.95,
rtn.table = TRUE,
check = TRUE
)
Arguments
data |
data.frame of data. |
vrb.nm |
character vector of colnames from |
nom.nm |
character vector of length 1 specifying the colname in
|
lvl |
character vector with length 3+ specifying the unique values for
the 3+ independent groups. If |
yates |
logical vector of length 1 specifying whether the Yate's
continuity correction should be applied for small samples. See
|
ci.level |
numeric vector of length 1 specifying the confidence level.
|
rtn.table |
logical vector of lengh 1 specifying whether the return object should include the X by 2 contingency table of counts with totals for each dummy variable and the X by 2 overall percentages table with totals for each dummy variable. If TRUE, then the last two elements of the return object are "count" containing an array of counts and "percent" containing an array of overall percentages. |
check |
logical vector of length 1 specifying whether the input
arguments should be checked for errors. For example, if |
Details
The confidence interval for Cramer's V is calculated with fisher's r to z transformation as Cramer's V is a kind of multiple correlation coefficient. Cramer's V is transformed to fisher's z units, a symmetric confidence interval for fisher's z is calculated, and then the lower and upper bounds are back-transformed to Cramer's V units.
Value
list of data.frames containing statistical information about the
proportion comparisons: 1) nhst = chi-square test of independence stat info
in a data.frame, 2) desc = descriptive statistics stat info in a data.frame
(note there could be more than 3 groups - groups i, j, and k are just
provided as an example), 3) std = standardized effect size and its
confidence interval in a data.frame, 4) count = numeric array with dim =
[X+1, 3, length(vrb.nm)]
of the X by 2 contingency table of counts
for each dummy variable with an additional row and column for totals (if
rtn.table
= TRUE), 5) percent = numeric array with dim = [X+1,
3, length(vrb.nm)]
of the X by 2 contingency table of overall percentages
for each dummy variable with an additional row and column for totals (if
rtn.table
= TRUE).
1) nhst = chi-square test of independence stat info in a data.frame
- est
average proportion difference absolute value (i.e., |group j - group i|)
- se
NA (to remind the user there is no standard error for the test)
- X2
chi-square value
- df
degrees of freedom (of the nominal variable)
- p
two-sided p-value
2) desc = descriptive statistics stat info in a data.frame (note there could be more than 3 groups - groups i, j, and k are just provided as an example):
- prop_'lvl[k]'
proportion of group k
- prop_'lvl[j]'
proportion of group j
- prop_'lvl[i]'
proportion of group i
- sd_'lvl[k]'
standard deviation of group k
- sd_'lvl[j]'
standard deviation of group j
- sd_'lvl[i]'
standard deviation of group i
- n_'lvl[k]'
sample size of group k
- n_'lvl[j]'
sample size of group j
- n_'lvl[i]'
sample size of group i
3) std = standardized effect size and its confidence interval in a data.frame
- cramer
Cramer's V estimate
- lwr
lower bound of Cramer's V confidence interval
- upr
upper bound of Cramer's V confidence interval
4) count = numeric array with dim = [X+1, 3, length(vrb.nm)]
of the X
by 2 contingency table of counts for each dummy variable with an additional
row and column for totals (if rtn.table
= TRUE).
The 3+ unique observed values of data[[nom.nm]]
- plus the total - are
the rows and the two unique observed values of data[[vrb.nm]]
(i.e., 0
and 1) - plus the total - are the columns. The variables in
data[vrb.nm]
are the layers. The dimlabels are "nom" for the rows and
"x" for the columns and "vrb" for the layers. The rownames are 1. 'lvl[i]',
2. 'lvl[j]', 3. 'lvl[k]', 4. "total". The colnames are 1. "0", 2. "1", 3.
"total". The laynames are vrb.nm
.
5) percent = numeric array with dim = [X+1, 3, length(vrb.nm)]
of the
X by 2 contingency table of overall percentages for each dummy variable with
an additional row and column for totals (if rtn.table
= TRUE).
The 3+ unique observed values of data[[nom.nm]]
- plus the total - are
the rows and the two unique observed values of data[[vrb.nm]]
(i.e., 0
and 1) - plus the total - are the columns. The variables in
data[vrb.nm]
are the layers. The dimlabels are "nom" for the rows, "x"
for the columns, and "vrb" for the layers. The rownames are 1. 'lvl[i]', 2.
'lvl[j]', 3. 'lvl[k]', 4. "total". The colnames are 1. "0", 2. "1", 3.
"total". The laynames are vrb.nm
.
See Also
prop.test
the workhorse for prop_compare
,
prop_compare
for a single dummy variable,
props_diff
for only 2 independent groups (aka binary variable),
Examples
# rtn.table = TRUE (default)
# multiple variables
tmp <- replicate(n = 10, expr = mtcars, simplify = FALSE)
mtcars2 <- str2str::ld2d(tmp)
mtcars2$"gear_dum" <- ifelse(mtcars2$"gear" > 3, yes = 1L, no = 0L)
mtcars2$"carb_dum" <- ifelse(mtcars2$"carb" > 3, yes = 1L, no = 0L)
vrb_nm <- c("am","gear_dum","carb_dum") # dummy variables
lapply(X = vrb_nm, FUN = function(nm) {
tmp <- c("cyl", nm)
table(mtcars2[tmp])
})
props_compare(data = mtcars2, vrb.nm = c("am","gear_dum","carb_dum"), nom.nm = "cyl")
# single variable
props_compare(mtcars2, vrb.nm = "am", nom.nm = "cyl")
# rtn.table = FALSE (no "count" or "percent" list elements)
# multiple variables
props_compare(data = mtcars2, vrb.nm = c("am","gear_dum","carb_dum"), nom.nm = "cyl",
rtn.table = FALSE)
# single variable
props_compare(mtcars2, vrb.nm = "am", nom.nm = "cyl",
rtn.table = FALSE)
# more than 3 groups
airquality2 <- airquality
airquality2$"Wind_dum" <- ifelse(airquality$"Wind" >= 10, yes = 1, no = 0)
airquality2$"Solar.R_dum" <- ifelse(airquality$"Solar.R" >= 100, yes = 1, no = 0)
props_compare(airquality2, vrb.nm = c("Wind_dum","Solar.R_dum"), nom.nm = "Month")
props_compare(airquality2, vrb.nm = "Wind_dum", nom.nm = "Month")