R: Sparse Cross Tabulation

Xtab {mefa4}

R Documentation

Sparse Cross Tabulation

Description

Create a contingency table from cross-classifying factors, usually contained in a data frame, using a formula interface.

Usage

Xtab(formula = ~., data = parent.frame(), rdrop, cdrop,
subset, na.action, exclude = c(NA, NaN), drop.unused.levels = FALSE)

Arguments

`formula`	a `formula` object with the cross-classifying variables (separated by +) on the right hand side (or an object which can be coerced to a formula). Interactions are not allowed. On the left hand side, one may optionally give a vector or a matrix of counts; in the latter case, the columns are interpreted as corresponding to the levels of a variable. This is useful if the data have already been tabulated, see the examples below.
`data`	an optional matrix or data frame (or similar: see `model.frame`) containing the variables in the formula formula. By default the variables are taken from environment(formula).
`rdrop`, `cdrop`	logical (should zero marginal rows/columns be removed after cross tabulation), character or numeric (what rows/columns should be removed).
`subset`	an optional vector specifying a subset of observations to be used.
`na.action`	a function which indicates what should happen when the data contain NAs.
`exclude`	a vector of values to be excluded when forming the set of levels of the classifying factors.
`drop.unused.levels`	a logical indicating whether to drop unused levels in the classifying factors. If this is FALSE and there are unused levels, the table will contain zero marginals, and a subsequent chi-squared test for independence of the factors will not work.

Details

The function creates two- or three-way cross tabulation. Only works for two or three factors.

If a left hand side is given in formula, its entries are simply summed over the cells corresponding to the right hand side; this also works if the left hand side does not give counts.

Value

A sparse numeric matrix inheriting from sparseMatrix, specifically an object of S4 class dgCMatrix.

For three factors, a list of sparse matrices.

Author(s)

This function is a slight modification of the xtabs function in the stats package.

Modified by Peter Solymos <solymos@ualberta.ca>

Examples

x <- data.frame(
    sample = paste("Sample", c(1,1,2,2,3,4), sep="."),
    species = c(paste("Species", c(1,1,1,2,3), sep="."),  "zero.pseudo"),
    count = c(1,2,10,3,4,0),
    stringsAsFactors = TRUE)
x
## Xtab class, counts by repetitions in RHS
(x0 <- Xtab(~ sample + species, x))
## counts by LHS and repetitions in RHS
(x1 <- Xtab(count ~ sample + species, x))
## drop all empty rows
(x2 <- Xtab(count ~ sample + species, x, cdrop=FALSE,rdrop=TRUE))
## drop all empty columns
Xtab(count ~ sample + species, x, cdrop=TRUE,rdrop=FALSE)
## drop specific columns by placeholder
Xtab(count ~ sample + species, x, cdrop="zero.pseudo")

## 2 and 3 way crosstabs
xx <- data.frame(
    sample = paste("Sample", c(1,1,2,2,3,4), sep="."),
    species = c(paste("Species", c(1,1,1,2,3), sep="."),  "zero.pseudo"),
    count = c(1,2,10,3,4,0),
    segment = letters[c(6,13,6,13,6,6)],
    stringsAsFactors = TRUE)
xx
Xtab(count ~ sample + species, xx)
Xtab(count ~ sample + species + segment, xx)

[Package mefa4 version 0.3-11 Index]