kruskal_wallis_test {sjstats}R Documentation

Kruskal-Wallis test

Description

This function performs a Kruskal-Wallis rank sum test, which is a non-parametric method to test the null hypothesis that the population median of all of the groups are equal. The alternative is that they differ in at least one. Unlike the underlying base R function kruskal.test(), this function allows for weighted tests.

Usage

kruskal_wallis_test(data, select = NULL, by = NULL, weights = NULL)

Arguments

data

A data frame.

select

Name(s) of the continuous variable(s) (as character vector) to be used as samples for the test. select can be one of the following:

  • select can be used in combination with by, in which case select is the name of the continous variable (and by indicates a grouping factor).

  • select can also be a character vector of length two or more (more than two names only apply to kruskal_wallis_test()), in which case the two continuous variables are treated as samples to be compared. by must be NULL in this case.

  • If select select is of length two and paired = TRUE, the two samples are considered as dependent and a paired test is carried out.

  • If select specifies one variable and by = NULL, a one-sample test is carried out (only applicable for t_test() and wilcoxon_test())

  • For chi_squared_test(), if select specifies one variable and both by and probabilities are NULL, a one-sample test against given probabilities is automatically conducted, with equal probabilities for each level of select.

by

Name of the variable indicating the groups. Required if select specifies only one variable that contains all samples to be compared in the test. If by is not a factor, it will be coerced to a factor. For chi_squared_test(), if probabilities is provided, by must be NULL.

weights

Name of an (optional) weighting variable to be used for the test.

Details

The function simply is a wrapper around kruskal.test(). The weighted version of the Kruskal-Wallis test is based on the survey package, using survey::svyranktest().

Value

A data frame with test results.

Which test to use

The following table provides an overview of which test to use for different types of data. The choice of test depends on the scale of the outcome variable and the number of samples to compare.

Samples Scale of Outcome Significance Test
1 binary / nominal chi_squared_test()
1 continuous, not normal wilcoxon_test()
1 continuous, normal t_test()
2, independent binary / nominal chi_squared_test()
2, independent continuous, not normal mann_whitney_test()
2, independent continuous, normal t_test()
2, dependent binary (only 2x2) chi_squared_test(paired=TRUE)
2, dependent continuous, not normal wilcoxon_test()
2, dependent continuous, normal t_test(paired=TRUE)
>2, independent continuous, not normal kruskal_wallis_test()
>2, independent continuous, normal datawizard::means_by_group()
>2, dependent continuous, not normal not yet implemented (1)
>2, dependent continuous, normal not yet implemented (2)

(1) More than two dependent samples are considered as repeated measurements. For ordinal or not-normally distributed outcomes, these samples are usually tested using a friedman.test(), which requires the samples in one variable, the groups to compare in another variable, and a third variable indicating the repeated measurements (subject IDs).

(2) More than two dependent samples are considered as repeated measurements. For normally distributed outcomes, these samples are usually tested using a ANOVA for repeated measurements. A more sophisticated approach would be using a linear mixed model.

References

See Also

Examples

data(efc)
# Kruskal-Wallis test for elder's age by education
kruskal_wallis_test(efc, "e17age", by = "c172code")

# when data is in wide-format, specify all relevant continuous
# variables in `select` and omit `by`
set.seed(123)
wide_data <- data.frame(
  scale1 = runif(20),
  scale2 = runif(20),
  scale3 = runif(20)
)
kruskal_wallis_test(wide_data, select = c("scale1", "scale2", "scale3"))

# same as if we had data in long format, with grouping variable
long_data <- data.frame(
  scales = c(wide_data$scale1, wide_data$scale2, wide_data$scale3),
  groups = rep(c("A", "B", "C"), each = 20)
)
kruskal_wallis_test(long_data, select = "scales", by = "groups")
# base R equivalent
kruskal.test(scales ~ groups, data = long_data)

[Package sjstats version 0.19.0 Index]