crosstable {crosstable} | R Documentation |
Easily describe datasets
Description
Generate a descriptive table of all chosen columns, as contingency tables for categorical variables and as calculation summaries for numeric variables. If the by
argument points to one or several categorical variables, crosstable
will output a description of all columns for each level. Otherwise, if it points to a numeric variable, crosstable
will calculate correlation coefficients with all other selected numeric columns. Finally, if it points to a Surv
object, crosstable
will describe the survival at different times.
Can be formatted as an HTML table using as_flextable()
.
Usage
crosstable(
data,
cols = everything(),
...,
by = NULL,
total = c("none", "row", "column", "both"),
percent_pattern = "{n} ({p_row})",
percent_digits = 2,
num_digits = 1,
showNA = c("ifany", "always", "no"),
label = TRUE,
funs = c(` ` = cross_summary),
funs_arg = list(),
cor_method = c("pearson", "kendall", "spearman"),
drop_levels = FALSE,
unique_numeric = 3,
date_format = NULL,
times = NULL,
followup = FALSE,
test = FALSE,
test_args = crosstable_test_args(),
effect = FALSE,
effect_args = crosstable_effect_args(),
margin = deprecated(),
.vars = deprecated()
)
Arguments
data |
A data.frame |
cols |
< |
... |
Unused. All parameters after this one must be named. |
by |
The variable to group on. Character or name. |
total |
one of ["none", "row", "column" or "both"] to indicate whether to add total rows and/or columns. Default to |
percent_pattern |
Pattern used to describe proportions in categorical data. Syntax uses a |
percent_digits |
Number of digits for percentages. |
num_digits |
Number of digits for numeric summaries. |
showNA |
Whether to show NA in categorical variables (one of |
label |
Whether to show labels. See |
funs |
Functions to apply to numeric variables. Default to |
funs_arg |
Additional parameters for |
cor_method |
One of |
drop_levels |
Whether to drop unused levels of factor variables. Default to |
unique_numeric |
The number of non-missing different levels a variable should have to be considered as numeric. |
date_format |
if |
times |
When using formula with |
followup |
When using formula with |
test |
Whether to perform tests. |
test_args |
See |
effect |
Whether to compute a effect measure. |
effect_args |
See |
margin |
Deprecated in favor of |
.vars |
Deprecated in favor of |
Value
A data.frame
/tibble
of class crosstable
About percent_pattern
The percent_pattern
argument is very powerful but can be difficult to understand at first :
It is usually a single string that uses the glue syntax, where variables are put in curly braces (
{x}
).Counts are expressed as
{n}
,{n_row}
,{n_col}
, and{n_tot}
, and proportions as{p_row}
,{p_col}
, and{p_cell}
, depending on the margin on which they are calculated.For each variable, a version including missing values in the total is proposed as
{n_xxx_na}
or{p_xxx_na}
.For each proportion, a confidence interval is also calculated using Wilson score and can be expressed as
{p_xxx_inf}
and{p_xxx_sup}
. See examples for practical applications.Alternatively,
percent_pattern
can be a list of characters with namesbody
,total_row
,total_col
, andtotal_all
to also control the pattern in other parts of the crosstable than the body.
Author(s)
Dan Chaltiel
See Also
https://danchaltiel.github.io/crosstable/, as_flextable, import_labels
Examples
#whole table
crosstable(iris)
crosstable(mtcars)
crosstable(mtcars2)
#tidyselection, custom functions
library(dplyr)
crosstable(mtcars2, c(ends_with("t"), starts_with("c")), by=vs,
funs=c(mean, quantile), funs_arg=list(probs=c(.25,.75)))
#margin and totals, multiple by
crosstable(mtcars2, c(disp, cyl), by=c(am, vs),
margin=c("row", "col"), total = "both")
#predicate selection, correlation, effect calculation
crosstable(mtcars2, where(is.numeric), by=hp, effect=TRUE)
#lambda selection & statistical tests
crosstable(mtcars2, ~is.numeric(.x) && mean(.x)>50, by=vs, test=TRUE)
#Dates
mtcars2$my_date = as.Date(mtcars2$hp , origin="2010-01-01") %>% set_label("Some nonsense date")
crosstable(mtcars2, my_date, by=vs, date_format="%d/%m/%Y")
#Survival data (using formula syntax)
library(survival)
crosstable(aml, Surv(time, status) ~ x, times=c(0,15,30,150), followup=TRUE)
#Patterns
crosstable(mtcars2, vs, by=am, percent_digits=0,
percent_pattern="{n} ({p_col} / {p_row})")
crosstable(mtcars2, vs, by=am, percent_digits=0,
percent_pattern="N={n} \np[95%CI] = {p_col} [{p_col_inf}; {p_col_sup}]")
str_high="n>5"; str_lo="n<=5"
crosstable(mtcars2, vs, by=am, percent_digits=0,
percent_pattern="col={p_col}, row={p_row} ({ifelse(n<5, str_lo, str_high)})")