chi_squared_test {sjstats} | R Documentation |
Chi-Squared test
Description
This function performs a \chi^2
test for contingency
tables or tests for given probabilities. The returned effects sizes are
Cramer's V for tables with more than two rows or columns, Phi (\phi
)
for 2x2 tables, and Fei (פ) for tests against
given probabilities (see Ben-Shachar et al. 2023).
Usage
chi_squared_test(
data,
select = NULL,
by = NULL,
probabilities = NULL,
weights = NULL,
paired = FALSE,
...
)
Arguments
data |
A data frame. |
select |
Name(s) of the continuous variable(s) (as character vector)
to be used as samples for the test.
|
by |
Name of the variable indicating the groups. Required if |
probabilities |
A numeric vector of probabilities for each cell in the
contingency table. The length of the vector must match the number of cells
in the table, i.e. the number of unique levels of the variable specified
in |
weights |
Name of an (optional) weighting variable to be used for the test. |
paired |
Logical, if |
... |
Additional arguments passed down to |
Details
The function is a wrapper around chisq.test()
and
fisher.test()
(for small expected values) for contingency tables, and
chisq.test()
for given probabilities. When probabilities
are provided,
these are rescaled to sum to 1 (i.e. rescale.p = TRUE
). When fisher.test()
is called, simulated p-values are returned (i.e. simulate.p.value = TRUE
,
see ?fisher.test
). If paired = TRUE
and a 2x2 table is provided,
a McNemar test (see mcnemar.test()
) is conducted.
The weighted version of the chi-squared test is based on the a weighted
table, using xtabs()
as input for chisq.test()
.
Interpretation of effect sizes are based on rules described in
effectsize::interpret_phi()
, effectsize::interpret_cramers_v()
,
and effectsize::interpret_fei()
. Use these function directly to get other
interpretations, by providing the returned effect size as argument, e.g.
interpret_phi(0.35, rules = "gignac2016")
.
Value
A data frame with test results. The returned effects sizes are
Cramer's V for tables with more than two rows or columns, Phi (\phi
)
for 2x2 tables, and Fei (פ) for tests against
given probabilities.
Which test to use
The following table provides an overview of which test to use for different types of data. The choice of test depends on the scale of the outcome variable and the number of samples to compare.
Samples | Scale of Outcome | Significance Test |
1 | binary / nominal | chi_squared_test() |
1 | continuous, not normal | wilcoxon_test() |
1 | continuous, normal | t_test() |
2, independent | binary / nominal | chi_squared_test() |
2, independent | continuous, not normal | mann_whitney_test() |
2, independent | continuous, normal | t_test() |
2, dependent | binary (only 2x2) | chi_squared_test(paired=TRUE) |
2, dependent | continuous, not normal | wilcoxon_test() |
2, dependent | continuous, normal | t_test(paired=TRUE) |
>2, independent | continuous, not normal | kruskal_wallis_test() |
>2, independent | continuous, normal | datawizard::means_by_group() |
>2, dependent | continuous, not normal | not yet implemented (1) |
>2, dependent | continuous, normal | not yet implemented (2) |
(1) More than two dependent samples are considered as repeated measurements.
For ordinal or not-normally distributed outcomes, these samples are
usually tested using a friedman.test()
, which requires the samples
in one variable, the groups to compare in another variable, and a third
variable indicating the repeated measurements (subject IDs).
(2) More than two dependent samples are considered as repeated measurements. For normally distributed outcomes, these samples are usually tested using a ANOVA for repeated measurements. A more sophisticated approach would be using a linear mixed model.
References
Ben-Shachar, M.S., Patil, I., Thériault, R., Wiernik, B.M., Lüdecke, D. (2023). Phi, Fei, Fo, Fum: Effect Sizes for Categorical Data That Use the Chi‑Squared Statistic. Mathematics, 11, 1982. doi:10.3390/math11091982
Bender, R., Lange, S., Ziegler, A. Wichtige Signifikanztests. Dtsch Med Wochenschr 2007; 132: e24–e25
du Prel, J.B., Röhrig, B., Hommel, G., Blettner, M. Auswahl statistischer Testverfahren. Dtsch Arztebl Int 2010; 107(19): 343–8
See Also
-
t_test()
for parametric t-tests of dependent and independent samples. -
mann_whitney_test()
for non-parametric tests of unpaired (independent) samples. -
wilcoxon_test()
for Wilcoxon rank sum tests for non-parametric tests of paired (dependent) samples. -
kruskal_wallis_test()
for non-parametric tests with more than two independent samples. -
chi_squared_test()
for chi-squared tests (two categorical variables, dependent and independent).
Examples
data(efc)
efc$weight <- abs(rnorm(nrow(efc), 1, 0.3))
# Chi-squared test
chi_squared_test(efc, "c161sex", by = "e16sex")
# weighted Chi-squared test
chi_squared_test(efc, "c161sex", by = "e16sex", weights = "weight")
# Chi-squared test for given probabilities
chi_squared_test(efc, "c161sex", probabilities = c(0.3, 0.7))