getDescriptionStatsBy {Gmisc} | R Documentation |
Creating of description statistics
Description
A function that returns a description statistic that can be used for creating a publication "table 1" when you want it by groups. The function identifies if the variable is a continuous, binary or a factored variable. The format is inspired by NEJM, Lancet & BMJ.
Usage
getDescriptionStatsBy(
x,
...,
by,
digits = 1,
digits.nonzero = NA,
html = TRUE,
numbers_first = TRUE,
statistics = FALSE,
statistics.sig_lim = 10^-4,
statistics.two_dec_lim = 10^-2,
statistics.suppress_warnings = TRUE,
useNA = c("ifany", "no", "always"),
useNA.digits = digits,
continuous_fn = describeMean,
prop_fn = describeProp,
factor_fn = describeFactors,
show_all_values = FALSE,
hrzl_prop = FALSE,
add_total_col,
total_col_show_perc = TRUE,
use_units = FALSE,
units_column_name = "Units",
default_ref = NULL,
NEJMstyle = FALSE,
percentage_sign = TRUE,
header_count = NULL,
missing_value = "-",
names_of_missing = NULL
)
## S3 method for class 'Gmisc_getDescriptionStatsBy'
htmlTable(x, ...)
## S3 method for class 'Gmisc_getDescriptionStatsBy'
print(x, ...)
## S3 method for class 'Gmisc_getDescriptionStatsBy'
knit_print(x, ...)
## S3 method for class 'Gmisc_getDescriptionStatsBy'
length(x)
Arguments
x |
If a data.frame it will be used as the data source for the variables in the |
... |
The variables that you want you statistic for. In the print all thes parameters are passed on as [htmlTable::htmlTable] arguments. |
by |
The variable that you want to split into different columns |
digits |
The number of decimals used |
digits.nonzero |
The number of decimals used for values that are close to zero |
html |
If HTML compatible output should be used. If |
numbers_first |
If the number should be given or if the percentage should be presented first. The second is encapsulated in parentheses (). |
statistics |
Add statistics, fisher test for proportions and Wilcoxon for continuous variables. See details below for more customization. |
statistics.sig_lim |
The significance limit for < sign, i.e. p-value 0.0000312 should be < 0.0001 with the default setting. |
statistics.two_dec_lim |
The limit for showing two decimals. E.g.
the p-value may be 0.056 and we may want to keep the two decimals in order
to emphasize the proximity to the all-mighty 0.05 p-value and set this to
|
statistics.suppress_warnings |
Hide warnings from the statistics function. |
useNA |
This indicates if missing should be added as a separate
row below all other. See |
useNA.digits |
The number of digits to use for the
missing percentage, defaults to the overall |
continuous_fn |
The method to describe continuous variables. The
default is |
prop_fn |
The method used to describe proportions, see |
factor_fn |
The method used to describe factors, see |
show_all_values |
Show all values in proportions. For factors with only two values
it is most sane to only show one option as the other one will just be a complement
to the first, i.e. we want to convey a proportion. For instance sex - if you know
gender then automatically you know the distribution of the other sex as it's 100 % - other %.
To choose which one you want to show then set the |
hrzl_prop |
This is default FALSE and indicates that the proportions are to be interpreted in a vertical manner. If we want the data to be horizontal, i.e. the total should be shown and then how these differ in the different groups then set this to TRUE. |
add_total_col |
This adds a total column to the resulting table. You can also specify if you want the total column "first" or "last" in the column order. |
total_col_show_perc |
This is by default true but if requested the percentages are suppressed as this sometimes may be confusing. |
use_units |
If the Hmisc package's units() function has been employed
it may be interesting to have a column at the far right that indicates the
unit measurement. If this column is specified then the total column will
appear before the units (if specified as last). You can also set the value to
|
units_column_name |
The name of the units column. Used if use_units = TRUE |
default_ref |
The default reference when dealing with proportions. When using 'dplyr' syntax ('tidyselect') you can specify a named vector/list for each column name. |
NEJMstyle |
Adds - no (%) at the end to proportions |
percentage_sign |
If you want to suppress the percentage sign you can set this variable to FALSE. You can also choose something else that the default % if you so wish by setting this variable. |
header_count |
Set to |
missing_value |
Value that is substituted for empty cells. Defaults to "-" |
names_of_missing |
Optional character vector containing the names of returned statistics,
in case all returned values for a given |
Value
Returns matrix
if a single value was provided, otherwise a list
of matrices with the class "Gmisc_getDescriptionStatsBy"
.
Customizing statistics
You can specify what function that you want for statistic by providing a function
that takes two arguments x
and by
and returns a p-value. There are
a few functions already prepared for this see getPvalAnova
,
getPvalChiSq
getPvalFisher
getPvalKruskal
getPvalWilcox
.
The default functions used are getPvalFisher
and getPvalWilcox
(unless the by
argument has more than three unique levels where it defaults to getPvalAnova
).
If you want the function to select functions depending on the type of input
you can provide a list with the names 'continuous'
, 'proportion'
, 'factor'
and
the function will choose accordingly. If you fail to define a certain category
it will default to the above.
You can also use a custom function that returns a string with the attribute 'colname'
set that will be appended to the results instead of the p-value column.
See Also
Other descriptive functions:
describeFactors()
,
describeMean()
,
describeMedian()
,
describeProp()
,
getPvalWilcox()
Examples
library(magrittr)
library(dplyr)
library(htmlTable)
data(mtcars)
mtcars %<>%
mutate(am = factor(am, levels = 0:1, labels = c("Automatic", "Manual")),
vs = factor(vs, levels = 0:1, labels = c("V-shaped", "straight")),
drat_prop = drat > median(drat),
drat_prop = factor(drat_prop,
levels = c(FALSE, TRUE),
labels = c("High ratio", "Low ratio")),
carb_prop = carb > 2,
carb_prop = factor(carb_prop,
levels = c(FALSE, TRUE),
labels = c("≤ 2", "> 2")),
across(c(gear, carb, cyl), factor))
# A simple bare-bone example
mtcars %>%
getDescriptionStatsBy(`Miles per gallon` = mpg,
Weight = wt,
`Carborators ≤ 2` = carb_prop,
by = am) %>%
htmlTable(caption = "Basic continuous stats from the mtcars dataset")
invisible(readline(prompt = "Press [enter] to continue"))
# For labeling & units we use set_column_labels/set_column_unit that use
# the Hmisc package annotation functions
mtcars %<>%
set_column_labels(am = "Transmission",
mpg = "Gas",
wt = "Weight",
gear = "Gears",
disp = "Displacement",
vs = "Engine type",
drat_prop = "Rear axel ratio",
carb_prop = "Carburetors") %>%
set_column_units(mpg = "Miles/(US) gallon",
wt = "10<sup>3</sup> lbs",
disp = "cu.in.")
mtcars %>%
getDescriptionStatsBy(mpg,
wt,
`Gear†` = gear,
drat_prop,
carb_prop,
vs,
by = am,
header_count = TRUE,
use_units = TRUE,
show_all_values = TRUE) %>%
addHtmlTableStyle(pos.caption = "bottom") %>%
htmlTable(caption = "Stats from the mtcars dataset",
tfoot = "† Number of forward gears")
invisible(readline(prompt = "Press [enter] to continue"))
# Using the default parameter we can
mtcars %>%
getDescriptionStatsBy(mpg,
wt,
`Gear†` = gear,
drat_prop,
carb_prop,
vs,
by = am,
header_count = TRUE,
use_units = TRUE,
default_ref = c(drat_prop = "Low ratio",
carb_prop = "> 2")) %>%
addHtmlTableStyle(pos.caption = "bottom") %>%
htmlTable(caption = "Stats from the mtcars dataset",
tfoot = "† Number of forward gears")
invisible(readline(prompt = "Press [enter] to continue"))
# We can also use lists
tll <- list()
tll[["Gear (3 to 5)"]] <- getDescriptionStatsBy(mtcars$gear, mtcars$am)
tll <- c(tll,
list(getDescriptionStatsBy(mtcars$disp, mtcars$am)))
mergeDesc(tll,
htmlTable_args = list(caption = "Factored variables")) %>%
htmlTable::addHtmlTableStyle(css.rgroup = "")
invisible(readline(prompt = "Press [enter] to continue"))
tl_no_units <- list()
tl_no_units[["Gas (mile/gallons)"]] <-
getDescriptionStatsBy(mtcars$mpg, mtcars$am,
header_count = TRUE)
tl_no_units[["Weight (10<sup>3</sup> kg)"]] <-
getDescriptionStatsBy(mtcars$wt, mtcars$am,
header_count = TRUE)
mergeDesc(tl_no_units,
tll) %>%
htmlTable::addHtmlTableStyle(css.rgroup = "")
invisible(readline(prompt = "Press [enter] to continue"))
# Other settings
mtcars$mpg[sample(1:NROW(mtcars), size = 5)] <- NA
getDescriptionStatsBy(mtcars$mpg,
mtcars$am,
statistics = TRUE)
invisible(readline(prompt = "Press [enter] to continue"))
# Do the horizontal version
getDescriptionStatsBy(mtcars$gear,
mtcars$am,
statistics = TRUE,
hrzl_prop = TRUE)
invisible(readline(prompt = "Press [enter] to continue"))
mtcars$wt_with_missing <- mtcars$wt
mtcars$wt_with_missing[sample(1:NROW(mtcars), size = 8)] <- NA
getDescriptionStatsBy(mtcars$wt_with_missing, mtcars$am, statistics = TRUE,
hrzl_prop = TRUE, total_col_show_perc = FALSE)
invisible(readline(prompt = "Press [enter] to continue"))
## Not run:
## There is also a LaTeX wrapper
tll <- list(
getDescriptionStatsBy(mtcars$gear, mtcars$am),
getDescriptionStatsBy(mtcars$col, mtcars$am))
latex(mergeDesc(tll),
caption = "Factored variables",
file = "")
## End(Not run)