prop_diff {quest} | R Documentation |
Proportion Difference for a Single Variable across Two Independent Groups (Chi-square Test of Independence)
Description
prop_diff
tests for proportion differences across two independent
groups with a chi-square test of independence. The function also calculates
the descriptive statistics for each group, various standardized effect sizes
(e.g., Cramer's V), and can provide the 2x2 contingency tables.
prop_diff
is simply a wrapper for prop.test
plus
some extra calculations.
Usage
prop_diff(
x,
bin,
lvl = levels(as.factor(bin)),
yates = TRUE,
zero.cell = 0.05,
smooth = TRUE,
ci.level = 0.95,
rtn.table = TRUE,
check = TRUE
)
Arguments
x |
numeric vector that only has values of 0 or 1 (or missing values), otherwise known as a dummy variable. |
bin |
atomic vector that only takes on two values (or missing values), otherwise known as a binary variable. |
lvl |
character vector with length 2 specifying the unique values for
the two groups. If |
yates |
logical vector of length 1 specifying whether the Yate's
continuity correction should be applied for small samples. See
|
zero.cell |
numeric vector of length 1 specifying what value to impute
for zero cell counts in the 2x2 contingency table when computing the
tetrachoric correlation. See |
smooth |
logical vector of length 1 specifying whether a smoothing
algorithm should be applied when estimating the tetrachoric correlation.
See |
ci.level |
numeric vector of length 1 specifying the confidence level.
|
rtn.table |
logical vector of lengh 1 specifying whether the return object should include the 2x2 contingency table of counts with totals and the 2x2 overall percentages table. If TRUE, then the last two elements of the return object are "count" containing a matrix of counts and "percent" containing a matrix of overall percentages. |
check |
logical vector of length 1 specifying whether the input
arguments should be checked for errors. For example, if |
Value
list of numeric vectors containing statistical information about the
mean difference: 1) nhst = chi-square test of independence stat info in a numeric vector,
2) desc = descriptive statistics stat info in a numeric vector, 3) std = various
standardized effect sizes in a numeric vector, 4) count = numeric matrix with
dim = [3, 3]
of the 2x2 contingency table of counts with an additional
row and column for totals (if rtn.table
= TRUE), 5) percent = numeric
matrix with dim = [3, 3]
of the 2x2 contingency table of overall percentages
with an additional row and column for totals (if rtn.table
= TRUE)
1) nhst = chi-square test of independence stat info in a numeric vector
- est
mean difference estimate (i.e., group 2 - group 1)
- se
NA (to remind the user there is no standard error for the test)
- X2
chi-square value
- df
degrees of freedom (will always be 1)
- p
two-sided p-value
- lwr
lower bound of the confidence interval
- upr
upper bound of the confidence interval
2) desc = descriptive statistics stat info in a numeric vector
- prop_'lvl[2]'
proportion of group 2
- prop_'lvl[1]'
proportion of group 1
- sd_'lvl[2]'
standard deviation of group 2
- sd_'lvl[1]'
standard deviation of group 1
- n_'lvl[2]'
sample size of group 2
- n_'lvl[1]'
sample size of group 1
3) std = various standardized effect sizes in a numeric vector
- cramer
Cramer's V estimate
- h
Cohen's h estimate
- phi
Phi coefficient estimate
- yule
Yule coefficient estimate
- tetra
Tetrachoric correlation estimate
- OR
odds ratio estimate
- RR
risk ratio estimate calculated as (i.e., group 2 / group 1). Note this value will often differ when recoding variables (as it should).
4) count = numeric matrix with dim = [3, 3]
of the 2x2 contingency table of
counts with an additional row and column for totals (if rtn.table
= TRUE).
The two unique observed values of x
(i.e., 0 and 1) - plus the
total - are the rows and the two unique observed values of bin
- plus
the total - are the columns. The dimlabels are "bin" for the rows and "x" for
the columns. The rownames are 1. "0", 2. "1", 3. "total". The colnames are 1.
'lvl[1]', 2. 'lvl[2]', 3. "total"
5) percent = numeric matrix with dim = [3, 3]
of the 2x2 contingency table of overall percentages with an additional
row and column for totals (if rtn.table
= TRUE).
The two unique observed values of x
(i.e., 0 and 1) - plus the total -
are the rows and the two unique observed values of bin
- plus the total -
are the columns. The dimlabels are "bin" for the rows and "x" for the columns.
The rownames are 1. "0", 2. "1", 3. "total". The colnames are 1. 'lvl[1]',
2. 'lvl[2]', 3. "total"
See Also
prop.test
the workhorse for prop_diff
,
props_diff
for multiple dummy variables,
phi
for another phi coefficient function
Yule
for another yule coefficient function
tetrachoric
for another tetrachoric coefficient function
Examples
# chi-square test of independence
# x = "am", bin = "vs"
mtcars2 <- mtcars
mtcars2$"vs_bin" <- ifelse(mtcars$"vs" == 1, yes = "yes", no = "no")
agg(mtcars2$"am", grp = mtcars2$"vs_bin", rep = FALSE, fun = mean)
prop_diff(x = mtcars2$"am", bin = mtcars2$"vs_bin")
prop_diff(x = mtcars2$"am", bin = mtcars2$"vs")
# using \code{lvl} argument
prop_diff(x = mtcars2$"am", bin = mtcars2$"vs_bin")
prop_diff(x = mtcars2$"am", bin = mtcars2$"vs_bin",
lvl = c("yes","no")) # reverses the direction of the effect
prop_diff(x = mtcars2$"am", bin = mtcars2$"vs",
lvl = c(1, 0)) # levels don't have to be character
# recoding the variables
prop_diff(x = mtcars2$"am", bin = ifelse(mtcars2$"vs_bin" == "yes",
yes = "no", no = "yes")) # reverses the direction of the effect
prop_diff(x = ifelse(mtcars2$"am" == 1, yes = 0, no = 1),
bin = mtcars2$"vs") # reverses the direction of the effect
prop_diff(x = ifelse(mtcars2$"am" == 1, yes = 0, no = 1),
bin = ifelse(mtcars2$"vs_bin" == "yes",
yes = "no", no = "yes")) # double reverse means same direction of the effect
# compare to stats::prop.test
# x = "am", bin = "vs_bin" (binary as the rows; dummy as the columns)
tmp <- c("vs_bin","am") # b/c Roxygen2 will cause problems
table_obj <- table(mtcars2[tmp])
row_order <- nrow(table_obj):1
col_order <- ncol(table_obj):1
table_obj4prop <- table_obj[row_order, col_order]
prop.test(table_obj4prop)
# compare to stats:chisq.test
chisq.test(x = mtcars2$"am", y = mtcars2$"vs_bin")
# compare to psych::phi
cor(mtcars2$"am", mtcars$"vs")
psych::phi(table_obj, digits = 7)
# compare to psych::yule()
psych::Yule(table_obj)
# compare to psych::tetrachoric
psych::tetrachoric(table_obj)
# Note, I couldn't find a case where psych::tetrachoric() failed to compute
psych::tetrachoric(table_obj4prop)
# different than single logistic regression
summary(glm(am ~ vs, data = mtcars, family = binomial(link = "logit")))