xtabs {stats} | R Documentation |
Cross Tabulation
Description
Create a contingency table (optionally a sparse matrix) from cross-classifying factors, usually contained in a data frame, using a formula interface.
Usage
xtabs(formula = ~., data = parent.frame(), subset, sparse = FALSE,
na.action, na.rm = FALSE, addNA = FALSE,
exclude = if(!addNA) c(NA, NaN), drop.unused.levels = FALSE)
## S3 method for class 'xtabs'
print(x, na.print = "", ...)
Arguments
formula |
a formula object with the cross-classifying variables
(separated by |
data |
an optional matrix or data frame (or similar: see
|
subset |
an optional vector specifying a subset of observations to be used. |
sparse |
logical specifying if the result should be a
sparse matrix, i.e., inheriting from
|
na.action |
a |
na.rm |
logical: should missing values on the left-hand side of the
|
addNA |
logical indicating if |
exclude |
a vector of values to be excluded when forming the set of levels of the classifying factors. |
drop.unused.levels |
a logical indicating whether to drop unused
levels in the classifying factors. If this is |
x |
an object of class |
na.print |
character string (or |
... |
further arguments passed to or from other methods. |
Details
There is a summary
method for contingency table objects created
by table
or xtabs(*, sparse = FALSE)
, which gives basic
information and performs a chi-squared test for independence of
factors (note that the function chisq.test
currently
only handles 2-d tables).
If a left-hand side is given in formula
, its entries are simply
summed over the cells corresponding to the right-hand side; this also
works if the LHS does not give counts.
For variables in formula
which are factors, exclude
must be specified explicitly; the default exclusions will not be used.
In R versions before 3.4.0, e.g., when na.action = na.pass
,
sometimes zeroes (0
) were returned instead of NA
s.
In R versions before 4.4.0, when !addNA
as by default,
the default na.action
was na.omit
, effectively
treating missing counts as zero.
Value
By default, when sparse = FALSE
,
a contingency table in array representation of S3 class c("xtabs",
"table")
, with a "call"
attribute storing the matched call.
When sparse = TRUE
, a sparse numeric matrix, specifically an
object of S4 class
dgTMatrix
from package
Matrix.
See Also
table
for traditional cross-tabulation, and
as.data.frame.table
which is the inverse operation of
xtabs
(see the DF
example below).
sparseMatrix
on sparse
matrices in package Matrix.
Examples
## 'esoph' has the frequencies of cases and controls for all levels of
## the variables 'agegp', 'alcgp', and 'tobgp'.
xtabs(cbind(ncases, ncontrols) ~ ., data = esoph)
## Output is not really helpful ... flat tables are better:
ftable(xtabs(cbind(ncases, ncontrols) ~ ., data = esoph))
## In particular if we have fewer factors ...
ftable(xtabs(cbind(ncases, ncontrols) ~ agegp, data = esoph))
## This is already a contingency table in array form.
DF <- as.data.frame(UCBAdmissions)
## Now 'DF' is a data frame with a grid of the factors and the counts
## in variable 'Freq'.
DF
## Nice for taking margins ...
xtabs(Freq ~ Gender + Admit, DF)
## And for testing independence ...
summary(xtabs(Freq ~ ., DF))
## with NA's
DN <- DF; DN[cbind(6:9, c(1:2,4,1))] <- NA
DN # 'Freq' is missing only for (Rejected, Female, B)
(xtNA <- xtabs(Freq ~ Gender + Admit, DN)) # NA prints 'invisibly'
print(xtNA, na.print = "NA") # show NA's better
xtabs(Freq ~ Gender + Admit, DN, na.rm = TRUE) # ignore missing Freq
## Use addNA = TRUE to tabulate missing factor levels:
xtabs(Freq ~ Gender + Admit, DN, addNA = TRUE)
xtabs(Freq ~ Gender + Admit, DN, addNA = TRUE, na.rm = TRUE)
## na.action = na.omit removes all rows with NAs right from the start:
xtabs(Freq ~ Gender + Admit, DN, na.action = na.omit)
## Create a nice display for the warp break data.
warpbreaks$replicate <- rep_len(1:9, 54)
ftable(xtabs(breaks ~ wool + tension + replicate, data = warpbreaks))
### ---- Sparse Examples ----
if(require("Matrix")) withAutoprint({
## similar to "nlme"s 'ergoStool' :
d.ergo <- data.frame(Type = paste0("T", rep(1:4, 9*4)),
Subj = gl(9, 4, 36*4))
xtabs(~ Type + Subj, data = d.ergo) # 4 replicates each
set.seed(15) # a subset of cases:
xtabs(~ Type + Subj, data = d.ergo[sample(36, 10), ], sparse = TRUE)
## Hypothetical two-level setup:
inner <- factor(sample(letters[1:25], 100, replace = TRUE))
inout <- factor(sample(LETTERS[1:5], 25, replace = TRUE))
fr <- data.frame(inner = inner, outer = inout[as.integer(inner)])
xtabs(~ inner + outer, fr, sparse = TRUE)
})