R: Apply a function to groups of columns

calc_group_stat {metacoder}

R Documentation

Apply a function to groups of columns

Description

For a given table in a taxmap object, apply a function to rows in groups of columns. The result of the function is used to create new columns. This is equivalent to splitting columns of a table by a factor and using apply on each group.

Usage

calc_group_stat(
  obj,
  data,
  func,
  groups = NULL,
  cols = NULL,
  other_cols = FALSE,
  out_names = NULL,
  dataset = NULL
)

Arguments

`obj`	A `taxmap` object
`data`	The name of a table in `obj$data`.
`func`	The function to apply. It should take a vector and return a single value. For example, `max` or `mean` could be used.
`groups`	Group multiple columns per treatment/group. This should be a vector of group IDs (e.g. character, integer) the same length as `cols` that defines which samples go in which group. When used, there will be one column in the output for each unique value in `groups`.
`cols`	The columns in `data` to use. By default, all numeric columns are used. Takes one of the following inputs: TRUE/FALSE: All/No columns will used. Character vector: The names of columns to use Numeric vector: The indexes of columns to use Vector of TRUE/FALSE of length equal to the number of columns: Use the columns corresponding to `TRUE` values.
`other_cols`	Preserve in the output non-target columns present in the input data. New columns will always be on the end. The "taxon_id" column will be preserved in the front. Takes one of the following inputs: NULL: No columns will be added back, not even the taxon id column. TRUE/FALSE: All/None of the non-target columns will be preserved. Character vector: The names of columns to preserve Numeric vector: The indexes of columns to preserve Vector of TRUE/FALSE of length equal to the number of columns: Preserve the columns corresponding to `TRUE` values.
`out_names`	The names of count columns in the output. Must be the same length and order as `cols` (or `unique(groups)`, if `groups` is used).
`dataset`	DEPRECIATED. use "data" instead.

Value

A tibble

Examples

## Not run: 
# Parse data for examples
x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

# Apply a function to every value without grouping 
calc_group_stat(x, "tax_data", function(v) v > 3)

# Calculate the means for each group
calc_group_stat(x, "tax_data", mean, groups = hmp_samples$sex)

# Calculate the variation for each group
calc_group_stat(x, "tax_data", sd, groups = hmp_samples$body_site)

# Different ways to use only some columns
calc_group_stat(x, "tax_data", function(v) v > 3,
                cols = c("700035949", "700097855", "700100489"))
calc_group_stat(x, "tax_data", function(v) v > 3,
                cols = 4:6)
calc_group_stat(x, "tax_data", function(v) v > 3,
                cols = startsWith(colnames(x$data$tax_data), "70001"))

# Including all other columns in ouput
calc_group_stat(x, "tax_data", mean, groups = hmp_samples$sex,
                other_cols = TRUE)

# Inlcuding specific columns in output
calc_group_stat(x, "tax_data", mean, groups = hmp_samples$sex,
                other_cols = 2)
calc_group_stat(x, "tax_data", mean, groups = hmp_samples$sex,
                other_cols = "otu_id")

# Rename output columns
calc_group_stat(x, "tax_data", mean, groups = hmp_samples$sex,
               out_names = c("Women", "Men"))


## End(Not run)