freq {tidytlg} | R Documentation |
Frequency counts and percentages
Description
Frequency counts and percentages for a variable by treatment and/or group.
Usage
freq(
df,
denom_df = df,
colvar = NULL,
tablebyvar = NULL,
rowvar = NULL,
rowbyvar = NULL,
statlist = getOption("tidytlg.freq.statlist.default"),
decimal = 1,
nested = FALSE,
cutoff = NULL,
cutoff_stat = "pct",
subset = TRUE,
descending_by = NULL,
display_missing = FALSE,
rowtext = NULL,
row_header = NULL,
.keep = TRUE,
.ord = FALSE,
pad = TRUE,
...
)
Arguments
df |
(required) dataframe containing records to summarize by treatment |
denom_df |
(optional) dataframe used for population based denominators (default = df) |
colvar |
(required) treatment variable within df to use to summarize |
tablebyvar |
(optional) repeat entire table by variable within df |
rowvar |
(required) character vector of variables to summarize within the dataframe |
rowbyvar |
(optional) repeat |
statlist |
(optional) statlist object of stats to keep of length 1 or 2 specifying list of statistics and format desired (e.g statlist(c("N", "n (x.x\ (x.x)"))) |
decimal |
(optional) decimal precision root level default (default = 1) |
nested |
(optional) INTERNAL USE ONLY. The default should not be changed.
Switch on when this function is called by |
cutoff |
(optional) percentage cutoff threshold. This can be passed as a
numeric cutoff, in that case any rows with greater than or equal to that
cutoff will be preserved, others will be dropped. To specify a single column
to define the cutoff logic, pass a character value of the form
|
cutoff_stat |
(optional) The value to cutoff by, n or pct. (default =
'pct'). Can be done with multiple columns by adding & or | ex. |
subset |
(optional) An R expression that will be passed to a
|
descending_by |
(optional) The column or columns to sort descending
counts. Can also provide a named list to do ascending order ex.
c("VarName1" = "asc", "VarName2" = "desc") would sort by VarName1 in
ascending order and VarName2 in descending order. In case of a tie in count
or |
display_missing |
(optional) Should the "missing" values be displayed? If missing values are displayed, denominators will include missing values. (default = FALSE) |
rowtext |
(optional) A character vector used to rename the |
row_header |
(optional) A character vector to be added to the table. |
.keep |
(optional) Should the |
.ord |
Should the ordering columns be output with the table? This is useful if a table needs to be merged or reordered in any way after build. |
pad |
(optional) A boolean that controls if levels with zero records should be included in the final table. (default = TRUE) |
... |
(optional) Named arguments to be included as columns on the table. |
Value
A dataframe of results
Sorting a 'freq' table
By default, a frequency table is sorted based on the factor level of the
rowvar
variable. If the rowvar
variable isn't a factor, it will be
sorted alphabetically. This behavior can be modified in two ways, the first
is the char2factor()
function that offers a interface for releveling a
variable based on a numeric variable, like VISITN. The second is based on
the descending_by
argument which will sort based on counts on a variable.
Examples
adsl <- data.frame(
USUBJID = c("DEMO-101", "DEMO-102", "DEMO-103"),
RACE = c("WHITE", "BLACK", "ASIAN"),
SEX = c("F", "M", "F"),
colnbr = factor(c("Placebo", "Low", "High"))
)
# Unique subject count of a single variable
freq(adsl
,colvar = "colnbr"
,rowvar = "RACE"
,statlist = statlist("n"))
# Unique subject count and percent of a single variable
freq(adsl
,colvar = "colnbr"
,rowvar = "RACE"
,statlist = statlist(c("N", "n (x.x%)")))
# Unique subject count of a variable by another variable
freq(adsl
,colvar = "colnbr"
,rowvar = "RACE"
,rowbyvar = "SEX"
,statlist = statlist("n"))
# Unique subject count of a variable by another variable using colvar and
# group to define the denominator
freq(adsl
,colvar = "colnbr"
,rowvar = "RACE"
,rowbyvar = "SEX"
,statlist = statlist("n (x.x%)", denoms_by = c("colnbr", "SEX")))
# Cut records where count meets threshold for any column
freq(cdisc_adsl
,rowvar = "ETHNIC"
,colvar = "TRT01P"
,statlist = statlist("n (x.x%)")
,cutoff = "5"
,cutoff_stat = "n")
# Cut records where count meets threshold for a specific column
freq(cdisc_adsl
,rowvar = "ETHNIC"
,colvar = "TRT01P"
,statlist = statlist("n (x.x%)")
,cutoff = "Placebo >= 3"
,cutoff_stat = "n")
# Below illustrates how to make the same calls to freq() as above, using
# table and column metadata.
# Unique subject count of a single variable
table_metadata <- tibble::tribble(
~anbr, ~func, ~df, ~rowvar, ~statlist, ~colvar,
1, "freq", "cdisc_adsl", "ETHNIC", statlist("n"), "TRT01PN"
)
generate_results(table_metadata,
column_metadata = column_metadata,
tbltype = "type1")
# Unique subject count and percent of a single variable
table_metadata <- tibble::tribble(
~anbr, ~func, ~df, ~rowvar, ~statlist, ~colvar,
"1", "freq", "cdisc_adsl", "ETHNIC", statlist(c("N", "n (x.x%)")),"TRT01PN"
)
generate_results(table_metadata,
column_metadata = column_metadata,
tbltype = "type1")
# Cut records where count meets threshold for any column
table_metadata <- tibble::tibble(
anbr= "1", func = "freq", df = "cdisc_adsl", rowvar = "ETHNIC",
statlist = statlist("n (x.x%)"), colvar = "TRT01PN", cutoff = 5,
cutoff_stat = "n")
generate_results(table_metadata,
column_metadata = column_metadata,
tbltype = "type1")
# Cut records where count meets threshold for a specific column
table_metadata <- tibble::tibble(
anbr= 1, func = "freq", df = "cdisc_adsl", rowvar = "ETHNIC",
statlist = statlist("n (x.x%)"), colvar = "TRT01PN",
cutoff = 'col1 >= 3', cutoff_stat = "n")
generate_results(table_metadata,
column_metadata = column_metadata,
tbltype = "type1")