analyze_variables {tern}R Documentation

Analyze variables

Description

[Stable]

The analyze function analyze_vars() generates a summary of one or more variables, using the S3 generic function s_summary() to calculate a list of summary statistics. A list of all available statistics for numeric variables can be viewed by running get_stats("analyze_vars_numeric") and for non-numeric variables by running get_stats("analyze_vars_counts"). Use the .stats parameter to specify the statistics to include in your output summary table.

Usage

analyze_vars(
  lyt,
  vars,
  var_labels = vars,
  na_str = default_na_str(),
  nested = TRUE,
  ...,
  na.rm = TRUE,
  show_labels = "default",
  table_names = vars,
  section_div = NA_character_,
  .stats = c("n", "mean_sd", "median", "range", "count_fraction"),
  .formats = NULL,
  .labels = NULL,
  .indent_mods = NULL
)

s_summary(x, na.rm = TRUE, denom, .N_row, .N_col, .var, ...)

## S3 method for class 'numeric'
s_summary(
  x,
  na.rm = TRUE,
  denom,
  .N_row,
  .N_col,
  .var,
  control = control_analyze_vars(),
  ...
)

## S3 method for class 'factor'
s_summary(
  x,
  na.rm = TRUE,
  denom = c("n", "N_row", "N_col"),
  .N_row,
  .N_col,
  ...
)

## S3 method for class 'character'
s_summary(
  x,
  na.rm = TRUE,
  denom = c("n", "N_row", "N_col"),
  .N_row,
  .N_col,
  .var,
  verbose = TRUE,
  ...
)

## S3 method for class 'logical'
s_summary(
  x,
  na.rm = TRUE,
  denom = c("n", "N_row", "N_col"),
  .N_row,
  .N_col,
  ...
)

a_summary(
  x,
  .N_col,
  .N_row,
  .var = NULL,
  .df_row = NULL,
  .ref_group = NULL,
  .in_ref_col = FALSE,
  compare = FALSE,
  .stats = NULL,
  .formats = NULL,
  .labels = NULL,
  .indent_mods = NULL,
  na.rm = TRUE,
  na_str = default_na_str(),
  ...
)

Arguments

lyt

(PreDataTableLayouts)
layout that analyses will be added to.

vars

(character)
variable names for the primary analysis variable to be iterated over.

var_labels

(character)
variable labels.

na_str

(string)
string used to replace all NA or empty values in the output.

nested

(flag)
whether this layout instruction should be applied within the existing layout structure _if possible (TRUE, the default) or as a new top-level element (FALSE). Ignored if it would nest a split. underneath analyses, which is not allowed.

...

arguments passed to s_summary().

na.rm

(flag)
whether NA values should be removed from x prior to analysis.

show_labels

(string)
label visibility: one of "default", "visible" and "hidden".

table_names

(character)
this can be customized in the case that the same vars are analyzed multiple times, to avoid warnings from rtables.

section_div

(string)
string which should be repeated as a section divider after each group defined by this split instruction, or NA_character_ (the default) for no section divider.

.stats

(character)
statistics to select for the table. Run get_stats("analyze_vars_numeric") to see statistics available for numeric variables, and get_stats("analyze_vars_counts") for statistics available for non-numeric variables.

.formats

(named character or list)
formats for the statistics. See Details in analyze_vars for more information on the "auto" setting.

.labels

(named character)
labels for the statistics (without indent).

.indent_mods

(named integer)
indent modifiers for the labels. Each element of the vector should be a name-value pair with name corresponding to a statistic specified in .stats and value the indentation for that statistic's row label.

x

(numeric)
vector of numbers we want to analyze.

denom

(string)
choice of denominator for proportion. Options are:

  • n: number of values in this row and column intersection.

  • N_row: total number of values in this row across columns.

  • N_col: total number of values in this column across rows.

.N_row

(integer(1))
row-wise N (row group count) for the group of observations being analyzed (i.e. with no column-based subsetting) that is typically passed by rtables.

.N_col

(integer(1))
column-wise N (column count) for the full column being analyzed that is typically passed by rtables.

.var

(string)
single variable name that is passed by rtables when requested by a statistics function.

control

(list)
parameters for descriptive statistics details, specified by using the helper function control_analyze_vars(). Some possible parameter options are:

  • conf_level (proportion)
    confidence level of the interval for mean and median.

  • quantiles (numeric(2))
    vector of length two to specify the quantiles.

  • quantile_type (numeric(1))
    between 1 and 9 selecting quantile algorithms to be used. See more about type in stats::quantile().

  • test_mean (numeric(1))
    value to test against the mean under the null hypothesis when calculating p-value.

verbose

(flag)
defaults to TRUE, which prints out warnings and messages. It is mainly used to print out information about factor casting.

.df_row

(data.frame)
data frame across all of the columns for the given row split.

.ref_group

(data.frame or vector)
the data corresponding to the reference group.

.in_ref_col

(flag)
TRUE when working with the reference level, FALSE otherwise.

compare

(flag)
whether comparison statistics should be analyzed instead of summary statistics (compare = TRUE adds pval statistic comparing against reference group).

Details

Automatic digit formatting: The number of digits to display can be automatically determined from the analyzed variable(s) (vars) for certain statistics by setting the statistic format to "auto" in .formats. This utilizes the format_auto() formatting function. Note that only data for the current row & variable (for all columns) will be considered (.df_row[[.var]], see rtables::additional_fun_params) and not the whole dataset.

Value

Functions

Note

Examples

## Fabricated dataset.
dta_test <- data.frame(
  USUBJID = rep(1:6, each = 3),
  PARAMCD = rep("lab", 6 * 3),
  AVISIT  = rep(paste0("V", 1:3), 6),
  ARM     = rep(LETTERS[1:3], rep(6, 3)),
  AVAL    = c(9:1, rep(NA, 9))
)

# `analyze_vars()` in `rtables` pipelines
## Default output within a `rtables` pipeline.
l <- basic_table() %>%
  split_cols_by(var = "ARM") %>%
  split_rows_by(var = "AVISIT") %>%
  analyze_vars(vars = "AVAL")

build_table(l, df = dta_test)

## Select and format statistics output.
l <- basic_table() %>%
  split_cols_by(var = "ARM") %>%
  split_rows_by(var = "AVISIT") %>%
  analyze_vars(
    vars = "AVAL",
    .stats = c("n", "mean_sd", "quantiles"),
    .formats = c("mean_sd" = "xx.x, xx.x"),
    .labels = c(n = "n", mean_sd = "Mean, SD", quantiles = c("Q1 - Q3"))
  )

build_table(l, df = dta_test)

## Use arguments interpreted by `s_summary`.
l <- basic_table() %>%
  split_cols_by(var = "ARM") %>%
  split_rows_by(var = "AVISIT") %>%
  analyze_vars(vars = "AVAL", na.rm = FALSE)

build_table(l, df = dta_test)

## Handle `NA` levels first when summarizing factors.
dta_test$AVISIT <- NA_character_
dta_test <- df_explicit_na(dta_test)
l <- basic_table() %>%
  split_cols_by(var = "ARM") %>%
  analyze_vars(vars = "AVISIT", na.rm = FALSE)

build_table(l, df = dta_test)

# auto format
dt <- data.frame("VAR" = c(0.001, 0.2, 0.0011000, 3, 4))
basic_table() %>%
  analyze_vars(
    vars = "VAR",
    .stats = c("n", "mean", "mean_sd", "range"),
    .formats = c("mean_sd" = "auto", "range" = "auto")
  ) %>%
  build_table(dt)

# `s_summary.numeric`

## Basic usage: empty numeric returns NA-filled items.
s_summary(numeric())

## Management of NA values.
x <- c(NA_real_, 1)
s_summary(x, na.rm = TRUE)
s_summary(x, na.rm = FALSE)

x <- c(NA_real_, 1, 2)
s_summary(x, stats = NULL)

## Benefits in `rtables` contructions:
dta_test <- data.frame(
  Group = rep(LETTERS[1:3], each = 2),
  sub_group = rep(letters[1:2], each = 3),
  x = 1:6
)

## The summary obtained in with `rtables`:
basic_table() %>%
  split_cols_by(var = "Group") %>%
  split_rows_by(var = "sub_group") %>%
  analyze(vars = "x", afun = s_summary) %>%
  build_table(df = dta_test)

## By comparison with `lapply`:
X <- split(dta_test, f = with(dta_test, interaction(Group, sub_group)))
lapply(X, function(x) s_summary(x$x))

# `s_summary.factor`

## Basic usage:
s_summary(factor(c("a", "a", "b", "c", "a")))

# Empty factor returns zero-filled items.
s_summary(factor(levels = c("a", "b", "c")))

## Management of NA values.
x <- factor(c(NA, "Female"))
x <- explicit_na(x)
s_summary(x, na.rm = TRUE)
s_summary(x, na.rm = FALSE)

## Different denominators.
x <- factor(c("a", "a", "b", "c", "a"))
s_summary(x, denom = "N_row", .N_row = 10L)
s_summary(x, denom = "N_col", .N_col = 20L)

# `s_summary.character`

## Basic usage:
s_summary(c("a", "a", "b", "c", "a"), .var = "x", verbose = FALSE)
s_summary(c("a", "a", "b", "c", "a", ""), .var = "x", na.rm = FALSE, verbose = FALSE)

# `s_summary.logical`

## Basic usage:
s_summary(c(TRUE, FALSE, TRUE, TRUE))

# Empty factor returns zero-filled items.
s_summary(as.logical(c()))

## Management of NA values.
x <- c(NA, TRUE, FALSE)
s_summary(x, na.rm = TRUE)
s_summary(x, na.rm = FALSE)

## Different denominators.
x <- c(TRUE, FALSE, TRUE, TRUE)
s_summary(x, denom = "N_row", .N_row = 10L)
s_summary(x, denom = "N_col", .N_col = 20L)

a_summary(factor(c("a", "a", "b", "c", "a")), .N_row = 10, .N_col = 10)
a_summary(
  factor(c("a", "a", "b", "c", "a")),
  .ref_group = factor(c("a", "a", "b", "c")), compare = TRUE
)

a_summary(c("A", "B", "A", "C"), .var = "x", .N_col = 10, .N_row = 10, verbose = FALSE)
a_summary(
  c("A", "B", "A", "C"),
  .ref_group = c("B", "A", "C"), .var = "x", compare = TRUE, verbose = FALSE
)

a_summary(c(TRUE, FALSE, FALSE, TRUE, TRUE), .N_row = 10, .N_col = 10)
a_summary(
  c(TRUE, FALSE, FALSE, TRUE, TRUE),
  .ref_group = c(TRUE, FALSE), .in_ref_col = TRUE, compare = TRUE
)

a_summary(rnorm(10), .N_col = 10, .N_row = 20, .var = "bla")
a_summary(rnorm(10, 5, 1), .ref_group = rnorm(20, -5, 1), .var = "bla", compare = TRUE)


[Package tern version 0.9.4 Index]