R: Cross-tabulation with custom summary function.

cross_fun {expss}

R Documentation

Cross-tabulation with custom summary function.

Description

cross_mean, cross_sum, cross_median calculate mean/sum/median by groups. NA's are always omitted.
cross_mean_sd_n calculates mean, standard deviation and N simultaneously. Mainly intended for usage with significance_means.
cross_pearson, cross_spearman calculate correlation of first variable in each data.frame in cell_vars with other variables. NA's are removed pairwise.
cross_fun, cross_fun_df return table with custom summary statistics defined by fun argument. NA's treatment depends on your fun behavior. To use weight you should have formal weight argument in fun and some logic for its processing inside. Several functions with weight support are provided - see w_mean. cross_fun applies fun on each variable in cell_vars separately, cross_fun_df gives to fun each data.frame in cell_vars as a whole. So cross_fun(iris[, -5], iris$Species, fun = mean) gives the same result as cross_fun_df(iris[, -5], iris$Species, fun = colMeans). For cross_fun_df names of cell_vars will converted to labels if they are available before the fun will be applied. Generally it is recommended that fun will always return object of the same form. Row names/vector names of fun result will appear in the row labels of the table and column names/names of list will appear in the column labels. If your fun returns data.frame/matrix/list with element named 'row_labels' then this element will be used as row labels. And it will have precedence over rownames.
cross_* are evaluate their arguments in the context of the first argument data.
cro_* functions use standard evaluation, e. g 'cro(mtcars$am, mtcars$vs)'.
combine_functions is auxiliary function for combining several functions into one function for usage with cro_fun/cro_fun_df. Names of arguments will be used as statistic labels. By default, results of each function are combined with c. But you can provide your own method function with method argument. It will be applied as in the expression do.call(method, list_of_functions_results). Particular useful method is list. When it used then statistic labels will appear in the column labels. See examples. Also you may be interested in data.frame, rbind, cbind methods.

Usage

cross_fun(
  data,
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL,
  fun,
  ...,
  unsafe = FALSE
)

cross_fun_df(
  data,
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL,
  fun,
  ...,
  unsafe = FALSE
)

cross_mean(
  data,
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL
)

cross_mean_sd_n(
  data,
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL,
  weighted_valid_n = FALSE,
  labels = NULL
)

cross_sum(
  data,
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL
)

cross_median(
  data,
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL
)

cross_pearson(
  data,
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL
)

cross_spearman(
  data,
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL
)

cro_fun(
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL,
  fun,
  ...,
  unsafe = FALSE
)

cro_fun_df(
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL,
  fun,
  ...,
  unsafe = FALSE
)

cro_mean(
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL
)

cro_mean_sd_n(
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL,
  weighted_valid_n = FALSE,
  labels = NULL
)

cro_sum(
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL
)

cro_median(
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL
)

cro_pearson(
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL
)

cro_spearman(
  cell_vars,
  col_vars = total(),
  row_vars = total(label = ""),
  weight = NULL,
  subgroup = NULL
)

combine_functions(..., method = c)

Arguments

`data`	data.frame in which context all other arguments will be evaluated (for `cross_*`).
`cell_vars`	vector/data.frame/list. Variables on which summary function will be computed.
`col_vars`	vector/data.frame/list. Variables which breaks table by columns. Use mrset/mdset for multiple-response variables.
`row_vars`	vector/data.frame/list. Variables which breaks table by rows. Use mrset/mdset for multiple-response variables.
`weight`	numeric vector. Optional cases weights. Cases with NA's, negative and zero weights are removed before calculations.
`subgroup`	logical vector. You can specify subgroup on which table will be computed.
`fun`	custom summary function. Generally it is recommended that `fun` will always return object of the same form. Rownames/vector names of `fun` result will appear in the row labels of the table and column names/names of list will appear in the column labels. To use weight you should have formal `weight` argument in `fun` and some logic for its processing inside. For `cro_fun_df` `fun` will receive data.table with all names converted to variable labels (if labels exists). So it is not recommended to rely on original variables names in your `fun`.
`...`	further arguments for `fun` in `cross_fun`/`cross_fun_df` or functions for `combine_functions`. Ignored in `cross_fun`/`cross_fun_df` if `unsafe` is TRUE.
`unsafe`	logical/character If not FALSE than `fun` will be evaluated as is. It can lead to significant increase in the performance. But there are some limitations. For `cross_fun` it means that your function `fun` should return vector. If length of this vector is greater than one than you should provide with `unsafe` argument vector of unique labels for each element of this vector. There will be no attempts to automatically make labels for the results of `fun`. For `cross_fun_df` your function should return vector or list/data.frame (optionally with 'row_labels' element - statistic labels). If `unsafe` is TRUE or not logical then further arguments (`...`) for `fun` will be ignored.
`weighted_valid_n`	logical. Should we show weighted valid N in `cro_mean_sd_n`? By default it is FALSE.
`labels`	character vector of length 3. Labels for mean, standard deviation and valid N in `cro_mean_sd_n`.
`method`	function which will combine results of multiple functions in `combine_functions`. It will be applied as in the expression `do.call(method, list_of_functions_results)`. By default it is `c`.

Value

object of class 'etable'. Basically it's a data.frame but class is needed for custom methods.

Examples

data(mtcars)
mtcars = apply_labels(mtcars,
                      mpg = "Miles/(US) gallon",
                      cyl = "Number of cylinders",
                      disp = "Displacement (cu.in.)",
                      hp = "Gross horsepower",
                      drat = "Rear axle ratio",
                      wt = "Weight (1000 lbs)",
                      qsec = "1/4 mile time",
                      vs = "Engine",
                      vs = c("V-engine" = 0,
                             "Straight engine" = 1),
                      am = "Transmission",
                      am = c("Automatic" = 0,
                             "Manual"=1),
                      gear = "Number of forward gears",
                      carb = "Number of carburetors"
)


# Simple example - there is special shortcut for it - 'cross_mean'
cross_fun(mtcars, 
          list(mpg, disp, hp, wt, qsec), 
          col_vars = list(total(), am), 
          row_vars = vs, 
          fun = mean)



# The same example with 'subgroup'
cross_fun(mtcars, 
       list(mpg, disp, hp, wt, qsec), 
       col_vars = list(total(), am), 
       row_vars = vs, 
       subgroup = vs == 0, 
       fun = mean)
                                
# 'combine_functions' usage  
cross_fun(mtcars, 
          list(mpg, disp, hp, wt, qsec), 
          col_vars = list(total(), am), 
          row_vars = vs, 
          fun = combine_functions(Mean = mean, 
                                  'Std. dev.' = sd,
                                  'Valid N' = valid_n)
)

# 'combine_functions' usage - statistic labels in columns
cross_fun(mtcars, 
          list(mpg, disp, hp, wt, qsec), 
          col_vars = list(total(), am), 
          row_vars = vs, 
          fun = combine_functions(Mean = mean, 
                                  'Std. dev.' = sd,
                                  'Valid N' = valid_n,
                                  method = list
                                  )
)

# 'summary' function
cross_fun(mtcars, 
          list(mpg, disp, hp, wt, qsec), 
          col_vars = list(total(), am), 
          row_vars = list(total(), vs), 
          fun = summary
) 
                          
# comparison 'cross_fun' and 'cross_fun_df'
cross_fun(mtcars,
          data.frame(mpg, disp, hp, wt, qsec), 
          col_vars = am,
          fun = mean
)


# same result
cross_fun_df(mtcars,
             data.frame(mpg, disp, hp, wt, qsec), 
             col_vars = am, 
             fun = colMeans
             )

# usage for 'cross_fun_df' which is not possible for 'cross_fun'
# linear regression by groups
cross_fun_df(mtcars,
             data.frame(mpg, disp, hp, wt, qsec), 
             col_vars = am,
             fun = function(x){
                 frm = reformulate(".", response = as.name(names(x)[1]))
                 model = lm(frm, data = x)
                 cbind('Coef.' = coef(model), 
                       confint(model)
                 )
             } 
)

[Package expss version 0.11.6 Index]

Cross-tabulation with custom summary function.

Description

Usage

Arguments

Value

See Also

Examples