R: Preliminary manipulations on incomplete categorical data

prelim.cat {cat}

R Documentation

Preliminary manipulations on incomplete categorical data

Description

This function performs grouping and sorting operations on categorical datasets with missing values. It creates a list that is needed for input to em.cat, da.cat, imp.cat, etc.

Usage

prelim.cat(x, counts, levs)

Arguments

`x`	categorical data matrix containing missing values. The data may be provided either in ungrouped or grouped format. In ungrouped format, the rows of x correspond to individual observational units, so that nrow(x) is the total sample size. In grouped format, the rows of x correspond to distinct covariate patterns; the frequencies are provided through the `counts` argument. In either format, the columns correspond to variables. The categories must be coded as consecutive positive integers beginning with 1 (1,2,...), and missing values are denoted by `NA`.
`counts`	optional vector of length `nrow(x)` giving the frequencies corresponding to the covariate patterns in x. The total sample size is `sum(counts)`. If `counts` is missing, the data are assumed to be ungrouped; this is equivalent to taking `counts` equal to `rep(1,nrow(x))`.
`levs`	optional vector of length `ncol(x)` indicating the number of levels for each categorical variable. If missing, `levs[j]` is taken to be `max(x[,j],na.rm=T)`.

Value

a list of seventeen components that summarize various features of x after the data have been sorted by missingness patterns and grouped according to the observed values. Components that might be of interest to the user include:

`nmis`	a vector of length `ncol(x)` containing the number of missing values for each variable in x.
`r`	matrix of response indicators showing the missing data patterns in x. Dimension is (m,p) where m is number of distinct missingness patterns in the rows of x, and p is the number of columns in x. Observed values are indicated by 1 and missing values by 0. The row names give the number of observations in each pattern, and the columns correspond to the columns of x.
`d`	vector of length `ncol(x)` indicating the number of levels for each variable. The complete-data contingency table would be an array with these dimensions. Identical to `levs` if `levs` was supplied.
`ncells`	number of cells in the cross-classified contingency table, equal to `prod(d)`.

References

Chapters 7–8 of Schafer (1996) Analysis of Incomplete Multivariate Data. Chapman & Hall.

Examples

data(crimes)
crimes
s <- prelim.cat(crimes[,1:2],crimes[,3])   # preliminary manipulations
s$nmis                      # see number of missing observations per variable
s$r                         # look at missing data patterns

[Package cat version 0.0-9 Index]