crosstable {crosstable}R Documentation

Easily describe datasets

Description

Generate a descriptive table of all chosen columns, as contingency tables for categorical variables and as calculation summaries for numeric variables. If the by argument points to one or several categorical variables, crosstable will output a description of all columns for each level. Otherwise, if it points to a numeric variable, crosstable will calculate correlation coefficients with all other selected numeric columns. Finally, if it points to a Surv object, crosstable will describe the survival at different times.

Can be formatted as an HTML table using as_flextable().

Usage

crosstable(
  data,
  cols = everything(),
  ...,
  by = NULL,
  total = c("none", "row", "column", "both"),
  percent_pattern = "{n} ({p_row})",
  percent_digits = 2,
  num_digits = 1,
  showNA = c("ifany", "always", "no"),
  label = TRUE,
  funs = c(` ` = cross_summary),
  funs_arg = list(),
  cor_method = c("pearson", "kendall", "spearman"),
  drop_levels = FALSE,
  unique_numeric = 3,
  date_format = NULL,
  times = NULL,
  followup = FALSE,
  test = FALSE,
  test_args = crosstable_test_args(),
  effect = FALSE,
  effect_args = crosstable_effect_args(),
  margin = deprecated(),
  .vars = deprecated()
)

Arguments

data

A data.frame

cols

<tidy-select> Columns to describe, default to everything(). See examples or vignette("crosstable-selection") for more details.

...

Unused. All parameters after this one must be named.

by

The variable to group on. Character or name.

total

one of ["none", "row", "column" or "both"] to indicate whether to add total rows and/or columns. Default to none.

percent_pattern

Pattern used to describe proportions in categorical data. Syntax uses a glue::glue() specification, see the section below for more details. Default to "{n} ({p_col})" if by is null and "{n} ({p_row})" if it is not.

percent_digits

Number of digits for percentages.

num_digits

Number of digits for numeric summaries.

showNA

Whether to show NA in categorical variables (one of c("ifany", "always", "no"), like in table()).

label

Whether to show labels. See import_labels() or set_label()for how to add labels to the dataset columns.

funs

Functions to apply to numeric variables. Default to cross_summary().

funs_arg

Additional parameters for funs, e.g. digits (the number of decimal places) for the default cross_summary(). Ultimately, these arguments are passed to format_fixed().

cor_method

One of c("pearson", "kendall", "spearman") to indicate which correlation coefficient is to be used.

drop_levels

Whether to drop unused levels of factor variables. Default to TRUE.

unique_numeric

The number of non-missing different levels a variable should have to be considered as numeric.

date_format

if x is a vector of Date or POSIXt, the format to apply (see strptime for formats)

times

When using formula with survival::Surv() objects, which times to summarize.

followup

When using formula with survival::Surv() objects, whether to display follow-up time.

test

Whether to perform tests.

test_args

See crosstable_test_args to override default testing behaviour.

effect

Whether to compute a effect measure.

effect_args

See crosstable_effect_args to override default behaviour.

margin

Deprecated in favor of percent_pattern. One of ["row", "column", "cell", "none", or "all"]. Default to row.

.vars

Deprecated in favor of cols.

Value

A data.frame/tibble of class crosstable

About percent_pattern

The percent_pattern argument is very powerful but can be difficult to understand at first :

Author(s)

Dan Chaltiel

See Also

https://danchaltiel.github.io/crosstable/, as_flextable, import_labels

Examples

#whole table
crosstable(iris)
crosstable(mtcars)
crosstable(mtcars2)

#tidyselection, custom functions
library(dplyr)
crosstable(mtcars2, c(ends_with("t"), starts_with("c")), by=vs,
           funs=c(mean, quantile), funs_arg=list(probs=c(.25,.75)))

#margin and totals, multiple by
crosstable(mtcars2, c(disp, cyl), by=c(am, vs),
           margin=c("row", "col"), total = "both")

#predicate selection, correlation, effect calculation
crosstable(mtcars2, where(is.numeric), by=hp, effect=TRUE)

#lambda selection & statistical tests
crosstable(mtcars2, ~is.numeric(.x) && mean(.x)>50, by=vs, test=TRUE)

#Dates
mtcars2$my_date = as.Date(mtcars2$hp , origin="2010-01-01") %>% set_label("Some nonsense date")
crosstable(mtcars2, my_date, by=vs, date_format="%d/%m/%Y")

#Survival data (using formula syntax)
library(survival)
crosstable(aml, Surv(time, status) ~ x, times=c(0,15,30,150), followup=TRUE)

#Patterns
crosstable(mtcars2, vs, by=am, percent_digits=0,
           percent_pattern="{n} ({p_col} / {p_row})")
crosstable(mtcars2, vs, by=am, percent_digits=0,
           percent_pattern="N={n} \np[95%CI] = {p_col} [{p_col_inf}; {p_col_sup}]")
str_high="n>5"; str_lo="n<=5"
crosstable(mtcars2, vs, by=am, percent_digits=0,
           percent_pattern="col={p_col}, row={p_row} ({ifelse(n<5, str_lo, str_high)})")

[Package crosstable version 0.7.0 Index]