ltable {popEpi} | R Documentation |
Tabulate Counts and Other Functions by Multiple Variables into a Long-Format Table
Description
ltable
makes use of data.table
capabilities to tabulate frequencies or
arbitrary functions of given variables into a long format
data.table
/data.frame
. expr.by.cj
is the
equivalent for more advanced users.
Usage
ltable(
data,
by.vars = NULL,
expr = list(obs = .N),
subset = NULL,
use.levels = TRUE,
na.rm = FALSE,
robust = TRUE
)
expr.by.cj(
data,
by.vars = NULL,
expr = list(obs = .N),
subset = NULL,
use.levels = FALSE,
na.rm = FALSE,
robust = FALSE,
.SDcols = NULL,
enclos = parent.frame(1L),
...
)
Arguments
data |
a |
by.vars |
names of variables that are used for categorization,
as a character vector, e.g. |
expr |
object or a list of objects where each object is a function of a variable (see: details) |
subset |
a logical condition; data is limited accordingly before
evaluating |
use.levels |
logical; if |
na.rm |
logical; if |
robust |
logical; if |
.SDcols |
advanced; a character vector of column names
passed to inside the data.table's brackets
|
enclos |
advanced; an environment; the enclosing environment of the data. |
... |
advanced; other arguments passed to inside the
data.table's brackets |
Details
Returns expr
for each unique combination of given by.vars
.
By default makes use of any and all levels
present for
each variable in by.vars
. This is useful,
because even if a subset of the data does not contain observations
for e.g. a specific age group, those age groups are
nevertheless presented in the resulting table; e.g. with the default
expr = list(obs = .N)
all age group levels
are represented by a row and can have obs = 0
.
The function differs from the
vanilla table
by giving a long format table of values
regardless of the number of by.vars
given.
Make use of e.g. cast_simple
if data needs to be
presented in a wide format (e.g. a two-way table).
The rows of the long-format table are effectively Cartesian products
of the levels of each variable in by.vars
,
e.g. with by.vars = c("sex", "area")
all levels of
area
are repeated for both levels of sex
in the table.
The expr
allows the user to apply any function(s) on all
levels defined by by.vars
. Here are some examples:
.N or list(.N) is a function used inside a
data.table
to calculate counts in each grouplist(obs = .N), same as above but user assigned variable name
list(sum(obs), sum(pyrs), mean(dg_age)), multiple objects in a list
list(obs = sum(obs), pyrs = sum(pyrs)), same as above with user defined variable names
If use.levels = FALSE
, no levels
information will
be used. This means that if e.g. the agegroup
variable is a factor and has 18 levels defined, but only 15 levels
are present in the data, no rows for the missing
levels will be shown in the table.
na.rm
simply drops any rows from the resulting table where
any of the by.vars
values was NA
.
Value
A 'data.table' of statistics (e.g. counts) stratified by the columns defined in 'by.vars'.
Functions
-
expr.by.cj()
: Somewhat more streamlinedltable
with defaults for speed. Explicit determination of enclosing environment of data.
Author(s)
Joonas Miettinen, Matti Rantanen
See Also
Examples
data("sire", package = "popEpi")
sr <- sire
sr$agegroup <- cut(sr$dg_age, breaks=c(0,45,60,75,85,Inf))
## counts by default
ltable(sr, "agegroup")
## any expression can be given
ltable(sr, "agegroup", list(mage = mean(dg_age)))
ltable(sr, "agegroup", list(mage = mean(dg_age), vage = var(dg_age)))
## also returns levels where there are zero rows (expressions as NA)
ltable(sr, "agegroup", list(obs = .N,
minage = min(dg_age),
maxage = max(dg_age)),
subset = dg_age < 85)
#### expr.by.cj
expr.by.cj(sr, "agegroup")
## any arbitrary expression can be given
expr.by.cj(sr, "agegroup", list(mage = mean(dg_age)))
expr.by.cj(sr, "agegroup", list(mage = mean(dg_age), vage = var(dg_age)))
## only uses levels of by.vars present in data
expr.by.cj(sr, "agegroup", list(mage = mean(dg_age), vage = var(dg_age)),
subset = dg_age < 70)
## .SDcols trick
expr.by.cj(sr, "agegroup", lapply(.SD, mean),
subset = dg_age < 70, .SDcols = c("dg_age", "status"))