| analyze_variables {tern} | R Documentation |
Analyze variables
Description
The analyze function analyze_vars() generates a summary of one or more variables, using the S3 generic function
s_summary() to calculate a list of summary statistics. A list of all available statistics for numeric
variables can be viewed by running get_stats("analyze_vars_numeric") and for non-numeric variables by running
get_stats("analyze_vars_counts"). Use the .stats parameter to specify the statistics to include in your output
summary table.
Usage
analyze_vars(
lyt,
vars,
var_labels = vars,
na_str = default_na_str(),
nested = TRUE,
...,
na.rm = TRUE,
show_labels = "default",
table_names = vars,
section_div = NA_character_,
.stats = c("n", "mean_sd", "median", "range", "count_fraction"),
.formats = NULL,
.labels = NULL,
.indent_mods = NULL
)
s_summary(x, na.rm = TRUE, denom, .N_row, .N_col, .var, ...)
## S3 method for class 'numeric'
s_summary(
x,
na.rm = TRUE,
denom,
.N_row,
.N_col,
.var,
control = control_analyze_vars(),
...
)
## S3 method for class 'factor'
s_summary(
x,
na.rm = TRUE,
denom = c("n", "N_row", "N_col"),
.N_row,
.N_col,
...
)
## S3 method for class 'character'
s_summary(
x,
na.rm = TRUE,
denom = c("n", "N_row", "N_col"),
.N_row,
.N_col,
.var,
verbose = TRUE,
...
)
## S3 method for class 'logical'
s_summary(
x,
na.rm = TRUE,
denom = c("n", "N_row", "N_col"),
.N_row,
.N_col,
...
)
a_summary(
x,
.N_col,
.N_row,
.var = NULL,
.df_row = NULL,
.ref_group = NULL,
.in_ref_col = FALSE,
compare = FALSE,
.stats = NULL,
.formats = NULL,
.labels = NULL,
.indent_mods = NULL,
na.rm = TRUE,
na_str = default_na_str(),
...
)
Arguments
lyt |
( |
vars |
( |
var_labels |
( |
na_str |
( |
nested |
( |
... |
arguments passed to |
na.rm |
( |
show_labels |
( |
table_names |
( |
section_div |
( |
.stats |
( |
.formats |
(named |
.labels |
(named |
.indent_mods |
(named |
x |
( |
denom |
(
|
.N_row |
( |
.N_col |
( |
.var |
( |
control |
(
|
verbose |
( |
.df_row |
( |
.ref_group |
( |
.in_ref_col |
( |
compare |
( |
Details
Automatic digit formatting: The number of digits to display can be automatically determined from the analyzed
variable(s) (vars) for certain statistics by setting the statistic format to "auto" in .formats.
This utilizes the format_auto() formatting function. Note that only data for the current row & variable (for all
columns) will be considered (.df_row[[.var]], see rtables::additional_fun_params) and not the whole dataset.
Value
-
analyze_vars()returns a layout object suitable for passing to further layouting functions, or tortables::build_table(). Adding this function to anrtablelayout will add formatted rows containing the statistics froms_summary()to the table layout.
-
s_summary()returns different statistics depending on the class ofx.
If
xis of classnumeric, returns alistwith the following namednumericitems:-
n: Thelength()ofx. -
sum: Thesum()ofx. -
mean: Themean()ofx. -
sd: Thestats::sd()ofx. -
se: The standard error ofxmean, i.e.: (sd(x) / sqrt(length(x))). -
mean_sd: Themean()andstats::sd()ofx. -
mean_se: Themean()ofxand its standard error (see above). -
mean_ci: The CI for the mean ofx(fromstat_mean_ci()). -
mean_sei: The SE interval for the mean ofx, i.e.: (mean()-/+stats::sd()/sqrt()). -
mean_sdi: The SD interval for the mean ofx, i.e.: (mean()-/+stats::sd()). -
mean_pval: The two-sided p-value of the mean ofx(fromstat_mean_pval()). -
median: Thestats::median()ofx. -
mad: The median absolute deviation ofx, i.e.: (stats::median()ofxc, wherexc=x-stats::median()). -
median_ci: The CI for the median ofx(fromstat_median_ci()). -
quantiles: Two sample quantiles ofx(fromstats::quantile()). -
iqr: Thestats::IQR()ofx. -
range: Therange_noinf()ofx. -
min: Themax()ofx. -
max: Themin()ofx. -
median_range: Themedian()andrange_noinf()ofx. -
cv: The coefficient of variation ofx, i.e.: (stats::sd()/mean()* 100). -
geom_mean: The geometric mean ofx, i.e.: (exp(mean(log(x)))). -
geom_cv: The geometric coefficient of variation ofx, i.e.: (sqrt(exp(sd(log(x)) ^ 2) - 1) * 100).
-
If
xis of classfactoror converted fromcharacter, returns alistwith namednumericitems:-
n: Thelength()ofx. -
count: A list with the number of cases for each level of the factorx. -
count_fraction: Similar tocountbut also includes the proportion of cases for each level of the factorxrelative to the denominator, orNAif the denominator is zero.
-
If
xis of classlogical, returns alistwith namednumericitems:-
n: Thelength()ofx(possibly after removingNAs). -
count: Count ofTRUEinx. -
count_fraction: Count and proportion ofTRUEinxrelative to the denominator, orNAif the denominator is zero. Note thatNAs inxare never counted or leading toNAhere.
-
-
a_summary()returns the corresponding list with formattedrtables::CellValue().
Functions
-
analyze_vars(): Layout-creating function which can take statistics function arguments and additional format arguments. This function is a wrapper forrtables::analyze(). -
s_summary(): S3 generic function to produces a variable summary. -
s_summary(numeric): Method fornumericclass. -
s_summary(factor): Method forfactorclass. -
s_summary(character): Method forcharacterclass. This makes an automatic conversion to factor (with a warning) and then forwards to the method for factors. -
s_summary(logical): Method forlogicalclass. -
a_summary(): Formatted analysis function which is used asafuninanalyze_vars()andcompare_vars()and ascfuninsummarize_colvars().
Note
If
xis an empty vector,NAis returned. This is the expected feature so as to returnrcellcontent inrtableswhen the intersection of a column and a row delimits an empty data selection.When the
meanfunction is applied to an empty vector,NAwill be returned instead ofNaN, the latter being standard behavior in R.
If
xis an emptyfactor, a list is still returned forcountswith one element per factor level. If there are no levels inx, the function fails.If factor variables contain
NA, theseNAvalues are excluded by default. To includeNAvalues setna.rm = FALSEand missing values will be displayed as anNAlevel. Alternatively, an explicit factor level can be defined forNAvalues during pre-processing viadf_explicit_na()- the defaultna_level("<Missing>") will also be excluded whenna.rmis set toTRUE.
Automatic conversion of character to factor does not guarantee that the table can be generated correctly. In particular for sparse tables this very likely can fail. It is therefore better to always pre-process the dataset such that factors are manually created from character variables before passing the dataset to
rtables::build_table().
To use for comparison (with additional p-value statistic), parameter
comparemust be set toTRUE.Ensure that either all
NAvalues are converted to an explicitNAlevel or allNAvalues are left as is.
Examples
## Fabricated dataset.
dta_test <- data.frame(
USUBJID = rep(1:6, each = 3),
PARAMCD = rep("lab", 6 * 3),
AVISIT = rep(paste0("V", 1:3), 6),
ARM = rep(LETTERS[1:3], rep(6, 3)),
AVAL = c(9:1, rep(NA, 9))
)
# `analyze_vars()` in `rtables` pipelines
## Default output within a `rtables` pipeline.
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
split_rows_by(var = "AVISIT") %>%
analyze_vars(vars = "AVAL")
build_table(l, df = dta_test)
## Select and format statistics output.
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
split_rows_by(var = "AVISIT") %>%
analyze_vars(
vars = "AVAL",
.stats = c("n", "mean_sd", "quantiles"),
.formats = c("mean_sd" = "xx.x, xx.x"),
.labels = c(n = "n", mean_sd = "Mean, SD", quantiles = c("Q1 - Q3"))
)
build_table(l, df = dta_test)
## Use arguments interpreted by `s_summary`.
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
split_rows_by(var = "AVISIT") %>%
analyze_vars(vars = "AVAL", na.rm = FALSE)
build_table(l, df = dta_test)
## Handle `NA` levels first when summarizing factors.
dta_test$AVISIT <- NA_character_
dta_test <- df_explicit_na(dta_test)
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
analyze_vars(vars = "AVISIT", na.rm = FALSE)
build_table(l, df = dta_test)
# auto format
dt <- data.frame("VAR" = c(0.001, 0.2, 0.0011000, 3, 4))
basic_table() %>%
analyze_vars(
vars = "VAR",
.stats = c("n", "mean", "mean_sd", "range"),
.formats = c("mean_sd" = "auto", "range" = "auto")
) %>%
build_table(dt)
# `s_summary.numeric`
## Basic usage: empty numeric returns NA-filled items.
s_summary(numeric())
## Management of NA values.
x <- c(NA_real_, 1)
s_summary(x, na.rm = TRUE)
s_summary(x, na.rm = FALSE)
x <- c(NA_real_, 1, 2)
s_summary(x, stats = NULL)
## Benefits in `rtables` contructions:
dta_test <- data.frame(
Group = rep(LETTERS[1:3], each = 2),
sub_group = rep(letters[1:2], each = 3),
x = 1:6
)
## The summary obtained in with `rtables`:
basic_table() %>%
split_cols_by(var = "Group") %>%
split_rows_by(var = "sub_group") %>%
analyze(vars = "x", afun = s_summary) %>%
build_table(df = dta_test)
## By comparison with `lapply`:
X <- split(dta_test, f = with(dta_test, interaction(Group, sub_group)))
lapply(X, function(x) s_summary(x$x))
# `s_summary.factor`
## Basic usage:
s_summary(factor(c("a", "a", "b", "c", "a")))
# Empty factor returns zero-filled items.
s_summary(factor(levels = c("a", "b", "c")))
## Management of NA values.
x <- factor(c(NA, "Female"))
x <- explicit_na(x)
s_summary(x, na.rm = TRUE)
s_summary(x, na.rm = FALSE)
## Different denominators.
x <- factor(c("a", "a", "b", "c", "a"))
s_summary(x, denom = "N_row", .N_row = 10L)
s_summary(x, denom = "N_col", .N_col = 20L)
# `s_summary.character`
## Basic usage:
s_summary(c("a", "a", "b", "c", "a"), .var = "x", verbose = FALSE)
s_summary(c("a", "a", "b", "c", "a", ""), .var = "x", na.rm = FALSE, verbose = FALSE)
# `s_summary.logical`
## Basic usage:
s_summary(c(TRUE, FALSE, TRUE, TRUE))
# Empty factor returns zero-filled items.
s_summary(as.logical(c()))
## Management of NA values.
x <- c(NA, TRUE, FALSE)
s_summary(x, na.rm = TRUE)
s_summary(x, na.rm = FALSE)
## Different denominators.
x <- c(TRUE, FALSE, TRUE, TRUE)
s_summary(x, denom = "N_row", .N_row = 10L)
s_summary(x, denom = "N_col", .N_col = 20L)
a_summary(factor(c("a", "a", "b", "c", "a")), .N_row = 10, .N_col = 10)
a_summary(
factor(c("a", "a", "b", "c", "a")),
.ref_group = factor(c("a", "a", "b", "c")), compare = TRUE
)
a_summary(c("A", "B", "A", "C"), .var = "x", .N_col = 10, .N_row = 10, verbose = FALSE)
a_summary(
c("A", "B", "A", "C"),
.ref_group = c("B", "A", "C"), .var = "x", compare = TRUE, verbose = FALSE
)
a_summary(c(TRUE, FALSE, FALSE, TRUE, TRUE), .N_row = 10, .N_col = 10)
a_summary(
c(TRUE, FALSE, FALSE, TRUE, TRUE),
.ref_group = c(TRUE, FALSE), .in_ref_col = TRUE, compare = TRUE
)
a_summary(rnorm(10), .N_col = 10, .N_row = 20, .var = "bla")
a_summary(rnorm(10, 5, 1), .ref_group = rnorm(20, -5, 1), .var = "bla", compare = TRUE)