R: Tabulate Counts and Other Functions by Multiple Variables...

ltable {popEpi}

R Documentation

Tabulate Counts and Other Functions by Multiple Variables into a Long-Format Table

Description

ltable makes use of data.table capabilities to tabulate frequencies or arbitrary functions of given variables into a long format data.table/data.frame. expr.by.cj is the equivalent for more advanced users.

Usage

ltable(
  data,
  by.vars = NULL,
  expr = list(obs = .N),
  subset = NULL,
  use.levels = TRUE,
  na.rm = FALSE,
  robust = TRUE
)

expr.by.cj(
  data,
  by.vars = NULL,
  expr = list(obs = .N),
  subset = NULL,
  use.levels = FALSE,
  na.rm = FALSE,
  robust = FALSE,
  .SDcols = NULL,
  enclos = parent.frame(1L),
  ...
)

Arguments

`data`	a `data.table`/`data.frame`
`by.vars`	names of variables that are used for categorization, as a character vector, e.g. `c('sex','agegroup')`
`expr`	object or a list of objects where each object is a function of a variable (see: details)
`subset`	a logical condition; data is limited accordingly before evaluating `expr` - but the result of `expr` is also returned as `NA` for levels not existing in the subset. See Examples.
`use.levels`	logical; if `TRUE`, uses factor levels of given variables if present; if you want e.g. counts for levels that actually have zero observations but are levels in a factor variable, use this
`na.rm`	logical; if `TRUE`, drops rows in table that have `NA` as values in any of `by.vars` columns
`robust`	logical; if `TRUE`, runs the output data's `by.vars` columns through `robust_values` before outputting
`.SDcols`	advanced; a character vector of column names passed to inside the data.table's brackets `DT[, , ...]`; see `data.table`; if `NULL`, uses all appropriate columns. See Examples for usage.
`enclos`	advanced; an environment; the enclosing environment of the data.
`...`	advanced; other arguments passed to inside the data.table's brackets `DT[, , ...]`; see `data.table`

Details

Returns expr for each unique combination of given by.vars.

By default makes use of any and all levels present for each variable in by.vars. This is useful, because even if a subset of the data does not contain observations for e.g. a specific age group, those age groups are nevertheless presented in the resulting table; e.g. with the default expr = list(obs = .N) all age group levels are represented by a row and can have obs = 0.

The function differs from the vanilla table by giving a long format table of values regardless of the number of by.vars given. Make use of e.g. cast_simple if data needs to be presented in a wide format (e.g. a two-way table).

The rows of the long-format table are effectively Cartesian products of the levels of each variable in by.vars, e.g. with by.vars = c("sex", "area") all levels of area are repeated for both levels of sex in the table.

The expr allows the user to apply any function(s) on all levels defined by by.vars. Here are some examples:

.N or list(.N) is a function used inside a data.table to calculate counts in each group
list(obs = .N), same as above but user assigned variable name
list(sum(obs), sum(pyrs), mean(dg_age)), multiple objects in a list
list(obs = sum(obs), pyrs = sum(pyrs)), same as above with user defined variable names

If use.levels = FALSE, no levels information will be used. This means that if e.g. the agegroup variable is a factor and has 18 levels defined, but only 15 levels are present in the data, no rows for the missing levels will be shown in the table.

na.rm simply drops any rows from the resulting table where any of the by.vars values was NA.

Value

A 'data.table' of statistics (e.g. counts) stratified by the columns defined in 'by.vars'.

Functions

expr.by.cj(): Somewhat more streamlined ltable with defaults for speed. Explicit determination of enclosing environment of data.

Author(s)

Joonas Miettinen, Matti Rantanen

Examples

data("sire", package = "popEpi")
sr <- sire
sr$agegroup <- cut(sr$dg_age, breaks=c(0,45,60,75,85,Inf))
## counts by default
ltable(sr, "agegroup")

## any expression can be given
ltable(sr, "agegroup", list(mage = mean(dg_age)))
ltable(sr, "agegroup", list(mage = mean(dg_age), vage = var(dg_age)))

## also returns levels where there are zero rows (expressions as NA)
ltable(sr, "agegroup", list(obs = .N, 
                            minage = min(dg_age), 
                            maxage = max(dg_age)), 
       subset = dg_age < 85)
       
#### expr.by.cj
expr.by.cj(sr, "agegroup")

## any arbitrary expression can be given
expr.by.cj(sr, "agegroup", list(mage = mean(dg_age)))
expr.by.cj(sr, "agegroup", list(mage = mean(dg_age), vage = var(dg_age)))

## only uses levels of by.vars present in data
expr.by.cj(sr, "agegroup", list(mage = mean(dg_age), vage = var(dg_age)), 
           subset = dg_age < 70)
           
## .SDcols trick
expr.by.cj(sr, "agegroup", lapply(.SD, mean), 
           subset = dg_age < 70, .SDcols = c("dg_age", "status"))

[Package popEpi version 0.4.12 Index]