| tables {expss} | R Documentation |
Functions for custom tables construction
Description
Table construction consists of at least of three functions chained with
magrittr pipe operator. At first we need to
specify variables for which statistics will be computed with
tab_cells. Secondary, we calculate statistics with one of
tab_stat_* functions. And last, we finalize table creation with
tab_pivot: dataset %>% tab_cells(variable) %>%
tab_stat_cases() %>% tab_pivot(). After that we can optionally sort table
with tab_sort_asc, drop empty rows/columns with drop_rc and
transpose with tab_transpose. Generally, table is just a data.frame so
we can use arbitrary operations on it. Statistic is always calculated with
the last cell, column/row variables, weight, missing values and subgroup. To
define new cell/column/row variables we can call appropriate function one more time.
tab_pivot defines how we combine different statistics and where
statistic labels will appear - inside/outside rows/columns. See examples.
For significance testing see significance.
Usage
tab_cols(data, ...)
tab_cells(data, ...)
tab_rows(data, ...)
tab_weight(data, weight = NULL)
tab_mis_val(data, ...)
tab_total_label(data, ...)
tab_total_statistic(data, ...)
tab_total_row_position(data, total_row_position = c("below", "above", "none"))
tab_subgroup(data, subgroup = NULL)
tab_row_label(data, ..., label = NULL)
tab_stat_fun(data, ..., label = NULL, unsafe = FALSE)
tab_stat_mean_sd_n(
data,
weighted_valid_n = FALSE,
labels = c("Mean", "Std. dev.", ifelse(weighted_valid_n, "Valid N", "Unw. valid N")),
label = NULL
)
tab_stat_mean(data, label = "Mean")
tab_stat_median(data, label = "Median")
tab_stat_se(data, label = "S. E.")
tab_stat_sum(data, label = "Sum")
tab_stat_min(data, label = "Min.")
tab_stat_max(data, label = "Max.")
tab_stat_sd(data, label = "Std. dev.")
tab_stat_valid_n(data, label = "Valid N")
tab_stat_unweighted_valid_n(data, label = "Unw. valid N")
tab_stat_fun_df(data, ..., label = NULL, unsafe = FALSE)
tab_stat_cases(
data,
total_label = NULL,
total_statistic = "u_cases",
total_row_position = c("below", "above", "none"),
label = NULL
)
tab_stat_cpct(
data,
total_label = NULL,
total_statistic = "u_cases",
total_row_position = c("below", "above", "none"),
label = NULL
)
tab_stat_cpct_responses(
data,
total_label = NULL,
total_statistic = "u_responses",
total_row_position = c("below", "above", "none"),
label = NULL
)
tab_stat_tpct(
data,
total_label = NULL,
total_statistic = "u_cases",
total_row_position = c("below", "above", "none"),
label = NULL
)
tab_stat_rpct(
data,
total_label = NULL,
total_statistic = "u_cases",
total_row_position = c("below", "above", "none"),
label = NULL
)
tab_last_vstack(
data,
stat_position = c("outside_rows", "inside_rows"),
stat_label = c("inside", "outside"),
label = NULL
)
tab_last_hstack(
data,
stat_position = c("outside_columns", "inside_columns"),
stat_label = c("inside", "outside"),
label = NULL
)
tab_pivot(
data,
stat_position = c("outside_rows", "inside_rows", "outside_columns", "inside_columns"),
stat_label = c("inside", "outside")
)
tab_transpose(data)
tab_caption(data, ...)
Arguments
data |
data.frame/intermediate_table |
... |
vector/data.frame/list. Variables for tables. Use mrset/mdset for multiple-response variables. |
weight |
numeric vector in |
total_row_position |
Position of total row in the resulting table. Can be one of "below", "above", "none". |
subgroup |
logical vector in |
label |
character. Label for the statistic in the |
unsafe |
logical If TRUE than |
weighted_valid_n |
logical. Sould we show weighted valid N in
|
labels |
character vector of length 3. Labels for mean, standard
deviation and valid N in |
total_label |
By default "#Total". You can provide several names - each name for each total statistics. |
total_statistic |
By default it is "u_cases" (unweighted cases). Possible values are "u_cases", "u_responses", "u_cpct", "u_rpct", "u_tpct", "w_cases", "w_responses", "w_cpct", "w_rpct", "w_tpct". "u_" means unweighted statistics and "w_" means weighted statistics. |
stat_position |
character one of the values |
stat_label |
character one of the values |
Details
tab_cellsvariables on which percentage/cases/summary functions will be computed. Use mrset/mdset for multiple-response variables.tab_colsoptional variables which breaks table by columns. Use mrset/mdset for multiple-response variables.tab_rowsoptional variables which breaks table by rows. Use mrset/mdset for multiple-response variables.tab_weightoptional weight for the statistic.tab_mis_valoptional missing values for the statistic. It will be applied on variables specified bytab_cells. It works in the same manner as na_if.tab_subgroupoptional logical vector/expression which specify subset of data for table.tab_row_labelAdd to table empty row with specified row labels. It is usefull for making section headings and etc.tab_total_row_positionDefault value fortotal_row_positionargument intab_stat_casesand etc. Can be one of "below", "above", "none".tab_total_labelDefault value fortotal_labelargument intab_stat_casesand etc. You can provide several names - each name for each total statistics.tab_total_statisticDefault value fortotal_statisticargument intab_stat_casesand etc. You can provide several values. Possible values are "u_cases", "u_responses", "u_cpct", "u_rpct", "u_tpct", "w_cases", "w_responses", "w_cpct", "w_rpct", "w_tpct". "u_" means unweighted statistics and "w_" means weighted statistics.tab_stat_fun,tab_stat_fun_dftab_stat_funapplies function on each variable in cells separately,tab_stat_fun_dfgives to function each data.frame in cells as a whole data.table with all names converted to variable labels (if labels exists). So it is not recommended to rely on original variables names in yourfun. For details see cross_fun. You can provide several functions as arguments. They will be combined as with combine_functions. So you can usemethodargument. For details see documentation for combine_functions.tab_stat_casescalculate counts.tab_stat_cpct,tab_stat_cpct_responsescalculate column percent. These functions give different results only for multiple response variables. Fortab_stat_cpctbase of percent is number of valid cases. Case is considered as valid if it has at least one non-NA value. So for multiple response variables sum of percent may be greater than 100. Fortab_stat_cpct_responsesbase of percent is number of valid responses. Multiple response variables can have several responses for single case. Sum of percent oftab_stat_cpct_responsesalways equals to 100%.tab_stat_rpctcalculate row percent. Base for percent is number of valid cases.tab_stat_tpctcalculate table percent. Base for percent is number of valid cases.tab_stat_mean,tab_stat_median,tab_stat_se,tab_stat_sum,tab_stat_min,tab_stat_max,tab_stat_sd,tab_stat_valid_n,tab_stat_unweighted_valid_ndifferent summary statistics. NA's are always omitted.tab_pivotfinalize table creation and define how differenttab_stat_*will be combinedtab_captionset caption on the table. Should be used after thetab_pivot.tab_transposetranspose final table aftertab_pivotor last statistic.
Value
All of these functions return object of class
intermediate_table except tab_pivot which returns final
result - object of class etable. Basically it's a data.frame but
class is needed for custom methods.
See Also
fre, cross_cases, cross_fun, tab_sort_asc, drop_empty_rows, significance.
Examples
## Not run:
data(mtcars)
mtcars = apply_labels(mtcars,
mpg = "Miles/(US) gallon",
cyl = "Number of cylinders",
disp = "Displacement (cu.in.)",
hp = "Gross horsepower",
drat = "Rear axle ratio",
wt = "Weight (1000 lbs)",
qsec = "1/4 mile time",
vs = "Engine",
vs = c("V-engine" = 0,
"Straight engine" = 1),
am = "Transmission",
am = c("Automatic" = 0,
"Manual"=1),
gear = "Number of forward gears",
carb = "Number of carburetors"
)
# some examples from 'cro'
# simple example - generally with 'cro' it can be made with less typing
mtcars %>%
tab_cells(cyl) %>%
tab_cols(vs) %>%
tab_stat_cpct() %>%
tab_pivot()
# split rows
mtcars %>%
tab_cells(cyl) %>%
tab_cols(vs) %>%
tab_rows(am) %>%
tab_stat_cpct() %>%
tab_pivot()
# multiple banners
mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), vs, am) %>%
tab_stat_cpct() %>%
tab_pivot()
# nested banners
mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), vs %nest% am) %>%
tab_stat_cpct() %>%
tab_pivot()
# summary statistics
mtcars %>%
tab_cells(mpg, disp, hp, wt, qsec) %>%
tab_cols(am) %>%
tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n) %>%
tab_pivot()
# summary statistics - labels in columns
mtcars %>%
tab_cells(mpg, disp, hp, wt, qsec) %>%
tab_cols(am) %>%
tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n, method = list) %>%
tab_pivot()
# subgroup with droping empty columns
mtcars %>%
tab_subgroup(am == 0) %>%
tab_cells(cyl) %>%
tab_cols(total(), vs %nest% am) %>%
tab_stat_cpct() %>%
tab_pivot() %>%
drop_empty_columns()
# total position at the top of the table
mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), vs) %>%
tab_rows(am) %>%
tab_stat_cpct(total_row_position = "above",
total_label = c("number of cases", "row %"),
total_statistic = c("u_cases", "u_rpct")) %>%
tab_pivot()
# this example cannot be made easily with 'cro'
mtcars %>%
tab_cells(am) %>%
tab_cols(total(), vs) %>%
tab_total_row_position("none") %>%
tab_stat_cpct(label = "col %") %>%
tab_stat_rpct(label = "row %") %>%
tab_stat_tpct(label = "table %") %>%
tab_pivot(stat_position = "inside_rows")
# statistic labels inside columns
mtcars %>%
tab_cells(am) %>%
tab_cols(total(), vs) %>%
tab_total_row_position("none") %>%
tab_stat_cpct(label = "col %") %>%
tab_stat_rpct(label = "row %") %>%
tab_stat_tpct(label = "table %") %>%
tab_pivot(stat_position = "inside_columns")
# stacked statistics
mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), am) %>%
tab_stat_mean() %>%
tab_stat_se() %>%
tab_stat_valid_n() %>%
tab_stat_cpct() %>%
tab_pivot()
# stacked statistics with section headings
mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), am) %>%
tab_row_label("#Summary statistics") %>%
tab_stat_mean() %>%
tab_stat_se() %>%
tab_stat_valid_n() %>%
tab_row_label("#Column percent") %>%
tab_stat_cpct() %>%
tab_pivot()
# stacked statistics with different variables
mtcars %>%
tab_cols(total(), am) %>%
tab_cells(mpg, hp, qsec) %>%
tab_stat_mean() %>%
tab_cells(cyl, carb) %>%
tab_stat_cpct() %>%
tab_pivot()
# stacked statistics - label position outside row labels
mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), am) %>%
tab_stat_mean() %>%
tab_stat_se %>%
tab_stat_valid_n() %>%
tab_stat_cpct(label = "Col %") %>%
tab_pivot(stat_label = "outside")
# example from 'cross_fun_df' - linear regression by groups with sorting
mtcars %>%
tab_cells(sheet(mpg, disp, hp, wt, qsec)) %>%
tab_cols(total(), am) %>%
tab_stat_fun_df(
function(x){
frm = reformulate(".", response = as.name(names(x)[1]))
model = lm(frm, data = x)
sheet('Coef.' = coef(model),
confint(model)
)
}
) %>%
tab_pivot() %>%
tab_sort_desc()
# multiple-response variables and weight
data(product_test)
codeframe_likes = num_lab("
1 Liked everything
2 Disliked everything
3 Chocolate
4 Appearance
5 Taste
6 Stuffing
7 Nuts
8 Consistency
98 Other
99 Hard to answer
")
set.seed(1)
product_test = product_test %>%
let(
# recode age by groups
age_cat = recode(s2a, lo %thru% 25 ~ 1, lo %thru% hi ~ 2),
wgt = runif(.N, 0.25, 4),
wgt = wgt/sum(wgt)*.N
) %>%
apply_labels(
age_cat = "Age",
age_cat = c("18 - 25" = 1, "26 - 35" = 2),
a1_1 = "Likes. VSX123",
b1_1 = "Likes. SDF456",
a1_1 = codeframe_likes,
b1_1 = codeframe_likes
)
product_test %>%
tab_cells(mrset(a1_1 %to% a1_6), mrset(b1_1 %to% b1_6)) %>%
tab_cols(total(), age_cat) %>%
tab_weight(wgt) %>%
tab_stat_cpct() %>%
tab_sort_desc() %>%
tab_pivot()
# trick to place cell variables labels inside columns
# useful to compare two variables
# '|' is needed to prevent automatic labels creation from argument
# alternatively we can use list(...) to avoid this
product_test %>%
tab_cols(total(), age_cat) %>%
tab_weight(wgt) %>%
tab_cells("|" = unvr(mrset(a1_1 %to% a1_6))) %>%
tab_stat_cpct(label = var_lab(a1_1)) %>%
tab_cells("|" = unvr(mrset(b1_1 %to% b1_6))) %>%
tab_stat_cpct(label = var_lab(b1_1)) %>%
tab_pivot(stat_position = "inside_columns")
# if you need standard evaluation, use 'vars'
tables = mtcars %>%
tab_cols(total(), am %nest% vs)
for(each in c("mpg", "disp", "hp", "qsec")){
tables = tables %>% tab_cells(vars(each)) %>%
tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n)
}
tables %>% tab_pivot()
## End(Not run)