kruskal_wallis_test {sjstats} | R Documentation |
Kruskal-Wallis test
Description
This function performs a Kruskal-Wallis rank sum test, which is
a non-parametric method to test the null hypothesis that the population median
of all of the groups are equal. The alternative is that they differ in at
least one. Unlike the underlying base R function kruskal.test()
, this
function allows for weighted tests.
Usage
kruskal_wallis_test(data, select = NULL, by = NULL, weights = NULL)
Arguments
data |
A data frame. |
select |
Name(s) of the continuous variable(s) (as character vector)
to be used as samples for the test.
|
by |
Name of the variable indicating the groups. Required if |
weights |
Name of an (optional) weighting variable to be used for the test. |
Details
The function simply is a wrapper around kruskal.test()
. The
weighted version of the Kruskal-Wallis test is based on the survey package,
using survey::svyranktest()
.
Value
A data frame with test results.
Which test to use
The following table provides an overview of which test to use for different types of data. The choice of test depends on the scale of the outcome variable and the number of samples to compare.
Samples | Scale of Outcome | Significance Test |
1 | binary / nominal | chi_squared_test() |
1 | continuous, not normal | wilcoxon_test() |
1 | continuous, normal | t_test() |
2, independent | binary / nominal | chi_squared_test() |
2, independent | continuous, not normal | mann_whitney_test() |
2, independent | continuous, normal | t_test() |
2, dependent | binary (only 2x2) | chi_squared_test(paired=TRUE) |
2, dependent | continuous, not normal | wilcoxon_test() |
2, dependent | continuous, normal | t_test(paired=TRUE) |
>2, independent | continuous, not normal | kruskal_wallis_test() |
>2, independent | continuous, normal | datawizard::means_by_group() |
>2, dependent | continuous, not normal | not yet implemented (1) |
>2, dependent | continuous, normal | not yet implemented (2) |
(1) More than two dependent samples are considered as repeated measurements.
For ordinal or not-normally distributed outcomes, these samples are
usually tested using a friedman.test()
, which requires the samples
in one variable, the groups to compare in another variable, and a third
variable indicating the repeated measurements (subject IDs).
(2) More than two dependent samples are considered as repeated measurements. For normally distributed outcomes, these samples are usually tested using a ANOVA for repeated measurements. A more sophisticated approach would be using a linear mixed model.
References
Bender, R., Lange, S., Ziegler, A. Wichtige Signifikanztests. Dtsch Med Wochenschr 2007; 132: e24–e25
du Prel, J.B., Röhrig, B., Hommel, G., Blettner, M. Auswahl statistischer Testverfahren. Dtsch Arztebl Int 2010; 107(19): 343–8
See Also
-
t_test()
for parametric t-tests of dependent and independent samples. -
mann_whitney_test()
for non-parametric tests of unpaired (independent) samples. -
wilcoxon_test()
for Wilcoxon rank sum tests for non-parametric tests of paired (dependent) samples. -
kruskal_wallis_test()
for non-parametric tests with more than two independent samples. -
chi_squared_test()
for chi-squared tests (two categorical variables, dependent and independent).
Examples
data(efc)
# Kruskal-Wallis test for elder's age by education
kruskal_wallis_test(efc, "e17age", by = "c172code")
# when data is in wide-format, specify all relevant continuous
# variables in `select` and omit `by`
set.seed(123)
wide_data <- data.frame(
scale1 = runif(20),
scale2 = runif(20),
scale3 = runif(20)
)
kruskal_wallis_test(wide_data, select = c("scale1", "scale2", "scale3"))
# same as if we had data in long format, with grouping variable
long_data <- data.frame(
scales = c(wide_data$scale1, wide_data$scale2, wide_data$scale3),
groups = rep(c("A", "B", "C"), each = 20)
)
kruskal_wallis_test(long_data, select = "scales", by = "groups")
# base R equivalent
kruskal.test(scales ~ groups, data = long_data)