R: Chi-squared Test for Clustered Count Data

chisqtestClust {htestClust}

R Documentation

Chi-squared Test for Clustered Count Data

Description

chisqtestClust performs chi-squared contingency table tests and goodness-of-fit tests for clustered data with potentially informative cluster size.

Usage

chisqtestClust(x, y = NULL, id, p = NULL,
             variance = c("MoM", "sand.null", "sand.est", "emp"))

Arguments

`x`	a numeric vector or factor. Can also be a table or data frame.
`y`	a numeric vector or factor of the same length as `x`. Ignored if `x` is a table or data frame.
`id`	a numeric vector or factor which identifies the clusters; ignored if `x` is a table or data frame. The length of `id` must be the same as the length of `x`.
`p`	a vector of probabilities with length equal to the number of unique categories of `x` if `x` is a vector, or equal to the number of columns of `x` if `x` is a table or data frame.
`variance`	character string specifying the method of variance estimation. Must be one of "`sand.null`", "`sand.est`", "`emp`", or "`MoM`".

Details

If x is 2-dimensional table or data frame, or if x is a vector or factor and y is not given, then the cluster-weighted goodness-of-fit test is performed. When x is a table or data frame, the rows of x must give the aggregate category counts across the clusters. In this case, the hypothesis tested is whether the marginal population probabilities equal those in p, or are all equal if p is not given.

When x, y, and id are all given as vectors or factors, the cluster-weighted chi-squared test of independence is performed. The lengths of x, y, and id must be equal. In this case, the hypothesis tested is that the joint probabilities of x and y are equal to the product of the marginal probabilities.

Value

A list with class "htest" containing the following components:

`statistic`	the value of the test statistic.
`parameter`	the degrees of freedom of the approximate chi-squared distribution of the test statistic.
`p.value`	the p-value of the test.
`method`	a character string indicating the test performed, and which variance estimation method was used.
`data.name`	a character string giving the name(s) of the data and the total number of clusters.
`M`	the number of clusters.
`observed`	the observed reweighted proportions.
`expected`	the expected proportions under the null hypothesis.

References

Gregg, M., Datta, S., Lorenz, D. (2020) Variance estimation in tests of clustered categorical data with informative cluster size. Statistical Methods in Medical Research, doi:10.1177/0962280220928572.

Examples

data(screen8)
## is the marginal extracurricular activity participation evenly distributed across categories?
## Goodness of Fit test using vectors.
chisqtestClust(x=screen8$activity, id=screen8$sch.id)

## Goodness of Fit test using table.
act.table <- table(screen8$sch.id, screen8$activity)
chisqtestClust(act.table)

## test if extracurricular activity participation and gender are independent
chisqtestClust(screen8$gender, screen8$activity, screen8$sch.id)

[Package htestClust version 0.2.2 Index]