analyze_variables {tern} | R Documentation |
Analyze variables
Description
The analyze function analyze_vars()
generates a summary of one or more variables, using the S3 generic function
s_summary()
to calculate a list of summary statistics. A list of all available statistics for numeric
variables can be viewed by running get_stats("analyze_vars_numeric")
and for non-numeric variables by running
get_stats("analyze_vars_counts")
. Use the .stats
parameter to specify the statistics to include in your output
summary table.
Usage
analyze_vars(
lyt,
vars,
var_labels = vars,
na_str = default_na_str(),
nested = TRUE,
...,
na.rm = TRUE,
show_labels = "default",
table_names = vars,
section_div = NA_character_,
.stats = c("n", "mean_sd", "median", "range", "count_fraction"),
.formats = NULL,
.labels = NULL,
.indent_mods = NULL
)
s_summary(x, na.rm = TRUE, denom, .N_row, .N_col, .var, ...)
## S3 method for class 'numeric'
s_summary(
x,
na.rm = TRUE,
denom,
.N_row,
.N_col,
.var,
control = control_analyze_vars(),
...
)
## S3 method for class 'factor'
s_summary(
x,
na.rm = TRUE,
denom = c("n", "N_row", "N_col"),
.N_row,
.N_col,
...
)
## S3 method for class 'character'
s_summary(
x,
na.rm = TRUE,
denom = c("n", "N_row", "N_col"),
.N_row,
.N_col,
.var,
verbose = TRUE,
...
)
## S3 method for class 'logical'
s_summary(
x,
na.rm = TRUE,
denom = c("n", "N_row", "N_col"),
.N_row,
.N_col,
...
)
a_summary(
x,
.N_col,
.N_row,
.var = NULL,
.df_row = NULL,
.ref_group = NULL,
.in_ref_col = FALSE,
compare = FALSE,
.stats = NULL,
.formats = NULL,
.labels = NULL,
.indent_mods = NULL,
na.rm = TRUE,
na_str = default_na_str(),
...
)
Arguments
lyt |
( |
vars |
( |
var_labels |
( |
na_str |
( |
nested |
( |
... |
arguments passed to |
na.rm |
( |
show_labels |
( |
table_names |
( |
section_div |
( |
.stats |
( |
.formats |
(named |
.labels |
(named |
.indent_mods |
(named |
x |
( |
denom |
(
|
.N_row |
( |
.N_col |
( |
.var |
( |
control |
(
|
verbose |
( |
.df_row |
( |
.ref_group |
( |
.in_ref_col |
( |
compare |
( |
Details
Automatic digit formatting: The number of digits to display can be automatically determined from the analyzed
variable(s) (vars
) for certain statistics by setting the statistic format to "auto"
in .formats
.
This utilizes the format_auto()
formatting function. Note that only data for the current row & variable (for all
columns) will be considered (.df_row[[.var]]
, see rtables::additional_fun_params
) and not the whole dataset.
Value
-
analyze_vars()
returns a layout object suitable for passing to further layouting functions, or tortables::build_table()
. Adding this function to anrtable
layout will add formatted rows containing the statistics froms_summary()
to the table layout.
-
s_summary()
returns different statistics depending on the class ofx
.
If
x
is of classnumeric
, returns alist
with the following namednumeric
items:-
n
: Thelength()
ofx
. -
sum
: Thesum()
ofx
. -
mean
: Themean()
ofx
. -
sd
: Thestats::sd()
ofx
. -
se
: The standard error ofx
mean, i.e.: (sd(x) / sqrt(length(x))
). -
mean_sd
: Themean()
andstats::sd()
ofx
. -
mean_se
: Themean()
ofx
and its standard error (see above). -
mean_ci
: The CI for the mean ofx
(fromstat_mean_ci()
). -
mean_sei
: The SE interval for the mean ofx
, i.e.: (mean()
-/+stats::sd()
/sqrt()
). -
mean_sdi
: The SD interval for the mean ofx
, i.e.: (mean()
-/+stats::sd()
). -
mean_pval
: The two-sided p-value of the mean ofx
(fromstat_mean_pval()
). -
median
: Thestats::median()
ofx
. -
mad
: The median absolute deviation ofx
, i.e.: (stats::median()
ofxc
, wherexc
=x
-stats::median()
). -
median_ci
: The CI for the median ofx
(fromstat_median_ci()
). -
quantiles
: Two sample quantiles ofx
(fromstats::quantile()
). -
iqr
: Thestats::IQR()
ofx
. -
range
: Therange_noinf()
ofx
. -
min
: Themax()
ofx
. -
max
: Themin()
ofx
. -
median_range
: Themedian()
andrange_noinf()
ofx
. -
cv
: The coefficient of variation ofx
, i.e.: (stats::sd()
/mean()
* 100). -
geom_mean
: The geometric mean ofx
, i.e.: (exp(mean(log(x)))
). -
geom_cv
: The geometric coefficient of variation ofx
, i.e.: (sqrt(exp(sd(log(x)) ^ 2) - 1) * 100
).
-
If
x
is of classfactor
or converted fromcharacter
, returns alist
with namednumeric
items:-
n
: Thelength()
ofx
. -
count
: A list with the number of cases for each level of the factorx
. -
count_fraction
: Similar tocount
but also includes the proportion of cases for each level of the factorx
relative to the denominator, orNA
if the denominator is zero.
-
If
x
is of classlogical
, returns alist
with namednumeric
items:-
n
: Thelength()
ofx
(possibly after removingNA
s). -
count
: Count ofTRUE
inx
. -
count_fraction
: Count and proportion ofTRUE
inx
relative to the denominator, orNA
if the denominator is zero. Note thatNA
s inx
are never counted or leading toNA
here.
-
-
a_summary()
returns the corresponding list with formattedrtables::CellValue()
.
Functions
-
analyze_vars()
: Layout-creating function which can take statistics function arguments and additional format arguments. This function is a wrapper forrtables::analyze()
. -
s_summary()
: S3 generic function to produces a variable summary. -
s_summary(numeric)
: Method fornumeric
class. -
s_summary(factor)
: Method forfactor
class. -
s_summary(character)
: Method forcharacter
class. This makes an automatic conversion to factor (with a warning) and then forwards to the method for factors. -
s_summary(logical)
: Method forlogical
class. -
a_summary()
: Formatted analysis function which is used asafun
inanalyze_vars()
andcompare_vars()
and ascfun
insummarize_colvars()
.
Note
If
x
is an empty vector,NA
is returned. This is the expected feature so as to returnrcell
content inrtables
when the intersection of a column and a row delimits an empty data selection.When the
mean
function is applied to an empty vector,NA
will be returned instead ofNaN
, the latter being standard behavior in R.
If
x
is an emptyfactor
, a list is still returned forcounts
with one element per factor level. If there are no levels inx
, the function fails.If factor variables contain
NA
, theseNA
values are excluded by default. To includeNA
values setna.rm = FALSE
and missing values will be displayed as anNA
level. Alternatively, an explicit factor level can be defined forNA
values during pre-processing viadf_explicit_na()
- the defaultna_level
("<Missing>"
) will also be excluded whenna.rm
is set toTRUE
.
Automatic conversion of character to factor does not guarantee that the table can be generated correctly. In particular for sparse tables this very likely can fail. It is therefore better to always pre-process the dataset such that factors are manually created from character variables before passing the dataset to
rtables::build_table()
.
To use for comparison (with additional p-value statistic), parameter
compare
must be set toTRUE
.Ensure that either all
NA
values are converted to an explicitNA
level or allNA
values are left as is.
Examples
## Fabricated dataset.
dta_test <- data.frame(
USUBJID = rep(1:6, each = 3),
PARAMCD = rep("lab", 6 * 3),
AVISIT = rep(paste0("V", 1:3), 6),
ARM = rep(LETTERS[1:3], rep(6, 3)),
AVAL = c(9:1, rep(NA, 9))
)
# `analyze_vars()` in `rtables` pipelines
## Default output within a `rtables` pipeline.
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
split_rows_by(var = "AVISIT") %>%
analyze_vars(vars = "AVAL")
build_table(l, df = dta_test)
## Select and format statistics output.
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
split_rows_by(var = "AVISIT") %>%
analyze_vars(
vars = "AVAL",
.stats = c("n", "mean_sd", "quantiles"),
.formats = c("mean_sd" = "xx.x, xx.x"),
.labels = c(n = "n", mean_sd = "Mean, SD", quantiles = c("Q1 - Q3"))
)
build_table(l, df = dta_test)
## Use arguments interpreted by `s_summary`.
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
split_rows_by(var = "AVISIT") %>%
analyze_vars(vars = "AVAL", na.rm = FALSE)
build_table(l, df = dta_test)
## Handle `NA` levels first when summarizing factors.
dta_test$AVISIT <- NA_character_
dta_test <- df_explicit_na(dta_test)
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
analyze_vars(vars = "AVISIT", na.rm = FALSE)
build_table(l, df = dta_test)
# auto format
dt <- data.frame("VAR" = c(0.001, 0.2, 0.0011000, 3, 4))
basic_table() %>%
analyze_vars(
vars = "VAR",
.stats = c("n", "mean", "mean_sd", "range"),
.formats = c("mean_sd" = "auto", "range" = "auto")
) %>%
build_table(dt)
# `s_summary.numeric`
## Basic usage: empty numeric returns NA-filled items.
s_summary(numeric())
## Management of NA values.
x <- c(NA_real_, 1)
s_summary(x, na.rm = TRUE)
s_summary(x, na.rm = FALSE)
x <- c(NA_real_, 1, 2)
s_summary(x, stats = NULL)
## Benefits in `rtables` contructions:
dta_test <- data.frame(
Group = rep(LETTERS[1:3], each = 2),
sub_group = rep(letters[1:2], each = 3),
x = 1:6
)
## The summary obtained in with `rtables`:
basic_table() %>%
split_cols_by(var = "Group") %>%
split_rows_by(var = "sub_group") %>%
analyze(vars = "x", afun = s_summary) %>%
build_table(df = dta_test)
## By comparison with `lapply`:
X <- split(dta_test, f = with(dta_test, interaction(Group, sub_group)))
lapply(X, function(x) s_summary(x$x))
# `s_summary.factor`
## Basic usage:
s_summary(factor(c("a", "a", "b", "c", "a")))
# Empty factor returns zero-filled items.
s_summary(factor(levels = c("a", "b", "c")))
## Management of NA values.
x <- factor(c(NA, "Female"))
x <- explicit_na(x)
s_summary(x, na.rm = TRUE)
s_summary(x, na.rm = FALSE)
## Different denominators.
x <- factor(c("a", "a", "b", "c", "a"))
s_summary(x, denom = "N_row", .N_row = 10L)
s_summary(x, denom = "N_col", .N_col = 20L)
# `s_summary.character`
## Basic usage:
s_summary(c("a", "a", "b", "c", "a"), .var = "x", verbose = FALSE)
s_summary(c("a", "a", "b", "c", "a", ""), .var = "x", na.rm = FALSE, verbose = FALSE)
# `s_summary.logical`
## Basic usage:
s_summary(c(TRUE, FALSE, TRUE, TRUE))
# Empty factor returns zero-filled items.
s_summary(as.logical(c()))
## Management of NA values.
x <- c(NA, TRUE, FALSE)
s_summary(x, na.rm = TRUE)
s_summary(x, na.rm = FALSE)
## Different denominators.
x <- c(TRUE, FALSE, TRUE, TRUE)
s_summary(x, denom = "N_row", .N_row = 10L)
s_summary(x, denom = "N_col", .N_col = 20L)
a_summary(factor(c("a", "a", "b", "c", "a")), .N_row = 10, .N_col = 10)
a_summary(
factor(c("a", "a", "b", "c", "a")),
.ref_group = factor(c("a", "a", "b", "c")), compare = TRUE
)
a_summary(c("A", "B", "A", "C"), .var = "x", .N_col = 10, .N_row = 10, verbose = FALSE)
a_summary(
c("A", "B", "A", "C"),
.ref_group = c("B", "A", "C"), .var = "x", compare = TRUE, verbose = FALSE
)
a_summary(c(TRUE, FALSE, FALSE, TRUE, TRUE), .N_row = 10, .N_col = 10)
a_summary(
c(TRUE, FALSE, FALSE, TRUE, TRUE),
.ref_group = c(TRUE, FALSE), .in_ref_col = TRUE, compare = TRUE
)
a_summary(rnorm(10), .N_col = 10, .N_row = 20, .var = "bla")
a_summary(rnorm(10, 5, 1), .ref_group = rnorm(20, -5, 1), .var = "bla", compare = TRUE)