tables {expss}R Documentation

Functions for custom tables construction

Description

Table construction consists of at least of three functions chained with magrittr pipe operator. At first we need to specify variables for which statistics will be computed with tab_cells. Secondary, we calculate statistics with one of tab_stat_* functions. And last, we finalize table creation with tab_pivot: dataset %>% tab_cells(variable) %>% tab_stat_cases() %>% tab_pivot(). After that we can optionally sort table with tab_sort_asc, drop empty rows/columns with drop_rc and transpose with tab_transpose. Generally, table is just a data.frame so we can use arbitrary operations on it. Statistic is always calculated with the last cell, column/row variables, weight, missing values and subgroup. To define new cell/column/row variables we can call appropriate function one more time. tab_pivot defines how we combine different statistics and where statistic labels will appear - inside/outside rows/columns. See examples. For significance testing see significance.

Usage

tab_cols(data, ...)

tab_cells(data, ...)

tab_rows(data, ...)

tab_weight(data, weight = NULL)

tab_mis_val(data, ...)

tab_total_label(data, ...)

tab_total_statistic(data, ...)

tab_total_row_position(data, total_row_position = c("below", "above", "none"))

tab_subgroup(data, subgroup = NULL)

tab_row_label(data, ..., label = NULL)

tab_stat_fun(data, ..., label = NULL, unsafe = FALSE)

tab_stat_mean_sd_n(
  data,
  weighted_valid_n = FALSE,
  labels = c("Mean", "Std. dev.", ifelse(weighted_valid_n, "Valid N", "Unw. valid N")),
  label = NULL
)

tab_stat_mean(data, label = "Mean")

tab_stat_median(data, label = "Median")

tab_stat_se(data, label = "S. E.")

tab_stat_sum(data, label = "Sum")

tab_stat_min(data, label = "Min.")

tab_stat_max(data, label = "Max.")

tab_stat_sd(data, label = "Std. dev.")

tab_stat_valid_n(data, label = "Valid N")

tab_stat_unweighted_valid_n(data, label = "Unw. valid N")

tab_stat_fun_df(data, ..., label = NULL, unsafe = FALSE)

tab_stat_cases(
  data,
  total_label = NULL,
  total_statistic = "u_cases",
  total_row_position = c("below", "above", "none"),
  label = NULL
)

tab_stat_cpct(
  data,
  total_label = NULL,
  total_statistic = "u_cases",
  total_row_position = c("below", "above", "none"),
  label = NULL
)

tab_stat_cpct_responses(
  data,
  total_label = NULL,
  total_statistic = "u_responses",
  total_row_position = c("below", "above", "none"),
  label = NULL
)

tab_stat_tpct(
  data,
  total_label = NULL,
  total_statistic = "u_cases",
  total_row_position = c("below", "above", "none"),
  label = NULL
)

tab_stat_rpct(
  data,
  total_label = NULL,
  total_statistic = "u_cases",
  total_row_position = c("below", "above", "none"),
  label = NULL
)

tab_last_vstack(
  data,
  stat_position = c("outside_rows", "inside_rows"),
  stat_label = c("inside", "outside"),
  label = NULL
)

tab_last_hstack(
  data,
  stat_position = c("outside_columns", "inside_columns"),
  stat_label = c("inside", "outside"),
  label = NULL
)

tab_pivot(
  data,
  stat_position = c("outside_rows", "inside_rows", "outside_columns", "inside_columns"),
  stat_label = c("inside", "outside")
)

tab_transpose(data)

tab_caption(data, ...)

Arguments

data

data.frame/intermediate_table

...

vector/data.frame/list. Variables for tables. Use mrset/mdset for multiple-response variables.

weight

numeric vector in tab_weight. Cases with NA's, negative and zero weights are removed before calculations.

total_row_position

Position of total row in the resulting table. Can be one of "below", "above", "none".

subgroup

logical vector in tab_subgroup. You can specify subgroup on which table will be computed.

label

character. Label for the statistic in the tab_stat_*.

unsafe

logical If TRUE than fun will be evaluated as is. It can lead to significant increase in the performance. But there are some limitations. For tab_stat_fun it means that your function fun should return vector of length one. Also there will be no attempts to make labels for statistic. For tab_stat_fun_df your function should return vector of length one or list/data.frame (optionally with 'row_labels' element - statistic labels). If unsafe is TRUE then further arguments (...) for fun will be ignored.

weighted_valid_n

logical. Sould we show weighted valid N in tab_stat_mean_sd_n? By default it is FALSE.

labels

character vector of length 3. Labels for mean, standard deviation and valid N in tab_stat_mean_sd_n.

total_label

By default "#Total". You can provide several names - each name for each total statistics.

total_statistic

By default it is "u_cases" (unweighted cases). Possible values are "u_cases", "u_responses", "u_cpct", "u_rpct", "u_tpct", "w_cases", "w_responses", "w_cpct", "w_rpct", "w_tpct". "u_" means unweighted statistics and "w_" means weighted statistics.

stat_position

character one of the values "outside_rows", "inside_rows", "outside_columns" or "inside_columns". It defines how we will combine statistics in the table.

stat_label

character one of the values "inside" or "outside". Where will be placed labels for the statistics relative to column names/row labels? See examples.

Details

Value

All of these functions return object of class intermediate_table except tab_pivot which returns final result - object of class etable. Basically it's a data.frame but class is needed for custom methods.

See Also

fre, cross_cases, cross_fun, tab_sort_asc, drop_empty_rows, significance.

Examples

## Not run: 
data(mtcars)
mtcars = apply_labels(mtcars,
                      mpg = "Miles/(US) gallon",
                      cyl = "Number of cylinders",
                      disp = "Displacement (cu.in.)",
                      hp = "Gross horsepower",
                      drat = "Rear axle ratio",
                      wt = "Weight (1000 lbs)",
                      qsec = "1/4 mile time",
                      vs = "Engine",
                      vs = c("V-engine" = 0,
                             "Straight engine" = 1),
                      am = "Transmission",
                      am = c("Automatic" = 0,
                             "Manual"=1),
                      gear = "Number of forward gears",
                      carb = "Number of carburetors"
)
# some examples from 'cro'
# simple example - generally with 'cro' it can be made with less typing
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(vs) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# split rows
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(vs) %>% 
    tab_rows(am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# multiple banners
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs, am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# nested banners
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs %nest% am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# summary statistics
mtcars %>% 
    tab_cells(mpg, disp, hp, wt, qsec) %>%
    tab_cols(am) %>% 
    tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n) %>%
    tab_pivot()

# summary statistics - labels in columns
mtcars %>% 
    tab_cells(mpg, disp, hp, wt, qsec) %>%
    tab_cols(am) %>% 
    tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n, method = list) %>%
    tab_pivot()

# subgroup with droping empty columns
mtcars %>% 
    tab_subgroup(am == 0) %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs %nest% am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    drop_empty_columns()

# total position at the top of the table
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs) %>% 
    tab_rows(am) %>% 
    tab_stat_cpct(total_row_position = "above",
                  total_label = c("number of cases", "row %"),
                  total_statistic = c("u_cases", "u_rpct")) %>% 
    tab_pivot()

# this example cannot be made easily with 'cro'             
mtcars %>%
    tab_cells(am) %>%
    tab_cols(total(), vs) %>%
    tab_total_row_position("none") %>% 
    tab_stat_cpct(label = "col %") %>%
    tab_stat_rpct(label = "row %") %>%
    tab_stat_tpct(label = "table %") %>%
    tab_pivot(stat_position = "inside_rows")

# statistic labels inside columns             
mtcars %>%
    tab_cells(am) %>%
    tab_cols(total(), vs) %>%
    tab_total_row_position("none") %>% 
    tab_stat_cpct(label = "col %") %>%
    tab_stat_rpct(label = "row %") %>%
    tab_stat_tpct(label = "table %") %>%
    tab_pivot(stat_position = "inside_columns")

# stacked statistics
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), am) %>% 
    tab_stat_mean() %>%
    tab_stat_se() %>% 
    tab_stat_valid_n() %>% 
    tab_stat_cpct() %>% 
    tab_pivot()
    
# stacked statistics with section headings
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), am) %>% 
    tab_row_label("#Summary statistics") %>% 
    tab_stat_mean() %>%
    tab_stat_se() %>% 
    tab_stat_valid_n() %>% 
    tab_row_label("#Column percent") %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# stacked statistics with different variables
mtcars %>% 
    tab_cols(total(), am) %>% 
    tab_cells(mpg, hp, qsec) %>% 
    tab_stat_mean() %>%
    tab_cells(cyl, carb) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# stacked statistics - label position outside row labels
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), am) %>% 
    tab_stat_mean() %>%
    tab_stat_se %>% 
    tab_stat_valid_n() %>% 
    tab_stat_cpct(label = "Col %") %>% 
    tab_pivot(stat_label = "outside")
    
# example from 'cross_fun_df' - linear regression by groups with sorting 
mtcars %>% 
    tab_cells(sheet(mpg, disp, hp, wt, qsec)) %>% 
    tab_cols(total(), am) %>% 
    tab_stat_fun_df(
        function(x){
            frm = reformulate(".", response = as.name(names(x)[1]))
            model = lm(frm, data = x)
            sheet('Coef.' = coef(model), 
                  confint(model)
            )
        }    
    ) %>% 
    tab_pivot() %>% 
    tab_sort_desc()

# multiple-response variables and weight
data(product_test)
codeframe_likes = num_lab("
                          1 Liked everything
                          2 Disliked everything
                          3 Chocolate
                          4 Appearance
                          5 Taste
                          6 Stuffing
                          7 Nuts
                          8 Consistency
                          98 Other
                          99 Hard to answer
                          ")

set.seed(1)
product_test = product_test %>% 
    let(
        # recode age by groups
        age_cat = recode(s2a, lo %thru% 25 ~ 1, lo %thru% hi ~ 2),
        wgt = runif(.N, 0.25, 4),
        wgt = wgt/sum(wgt)*.N
    ) %>% 
    apply_labels(
        age_cat = "Age",
        age_cat = c("18 - 25" = 1, "26 - 35" = 2),
        
        a1_1 = "Likes. VSX123",
        b1_1 = "Likes. SDF456",
        a1_1 = codeframe_likes,
        b1_1 = codeframe_likes
    )

product_test %>% 
    tab_cells(mrset(a1_1 %to% a1_6), mrset(b1_1 %to% b1_6)) %>% 
    tab_cols(total(), age_cat) %>% 
    tab_weight(wgt) %>% 
    tab_stat_cpct() %>% 
    tab_sort_desc() %>% 
    tab_pivot()
    
# trick to place cell variables labels inside columns
# useful to compare two variables
# '|' is needed to prevent automatic labels creation from argument
# alternatively we can use list(...) to avoid this
product_test %>% 
    tab_cols(total(), age_cat) %>% 
    tab_weight(wgt) %>% 
    tab_cells("|" = unvr(mrset(a1_1 %to% a1_6))) %>% 
    tab_stat_cpct(label = var_lab(a1_1)) %>% 
    tab_cells("|" = unvr(mrset(b1_1 %to% b1_6))) %>% 
    tab_stat_cpct(label = var_lab(b1_1)) %>% 
    tab_pivot(stat_position = "inside_columns")

# if you need standard evaluation, use 'vars'
tables = mtcars %>%
      tab_cols(total(), am %nest% vs)

for(each in c("mpg", "disp", "hp", "qsec")){
    tables = tables %>% tab_cells(vars(each)) %>%
        tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n) 
}
tables %>% tab_pivot()

## End(Not run) 

[Package expss version 0.11.6 Index]