dt.calculate {DTwrappers}R Documentation

dt.calculate

Description

This function allows a user to apply one or more functions to all of the specified variables in a data.frame or data.table object. It is built as a wrapper function of data.table's method of applying functions to variables while allowing for filtering and grouping steps. This allows a user to easily calculate many results, e.g. the.functions = c("mean", "median", "sd") on multiple columns, e.g. the.variables = c("Age", "Income") while also filtering and grouping the data. Options also exist to return a data.table coding statement (result = "code") for educational purposes or both the result and the code together (result = "all"). For examples, please see the vignette.

Usage

dt.calculate(
  dt.name,
  the.functions,
  the.variables = ".",
  the.filter = NULL,
  grouping.variables = NULL,
  sortby.group = TRUE,
  other.params = "",
  table.format = "long",
  add.function.name = TRUE,
  individual.variables = TRUE,
  output.as.table = TRUE,
  return.as = "result",
  envir = .GlobalEnv,
  ...
)

Arguments

dt.name

a character value specifying the name of a data.frame or data.table object to select data from. A variable called dat should be referred to with dt.name = "dat" when using the function.

the.functions

A character vector specifying the name of the functions to apply to the.variables. Each function included in the.functions will be separately applied to each variable in the.variables.

the.variables

A character or numeric vector specifying the variables to perform the calculations on. For character vectors, only values that exist in the names of the data will be used. For numeric vectors, only the values of unique(floor(sorting.variables)) that are in 1:ncol() of your data will be used. Then these indices will be used to select column names from the data. Other values in sorting.variables that do not correspond to a defined column will be excluded from the calculation. When the.variables includes ".", then all values in names(dat) will be selected. Values of the.variables that also exist in grouping.variables will be excluded from the.variables (but grouped by these values).

the.filter

a character value, numeric vector, logical vector, or expression stating the logical operations used to filter the data.  The filtering step will be applied prior to generating the counts.  Defaults to NULL unless otherwise specified. Logical vectors will be converted to a numeric filter, e.g. c(TRUE, TRUE, FALSE) will become 1:2 to signify which rows should be selected.

grouping.variables

A character or numeric vector specifying the variables to perform the calculations on. For character vectors, the values may be either column names of the data or calculations based upon them (see the vignette for examples). For numeric vectors, only the values of unique(floor(grouping.variables)) that are in 1:ncol() of your data will be used. Then these indices will be mapped to the corresponding column names from the data. When NULL, no grouping will be performed.

sortby.group

A logical value specifying whether the grouping should be sorted (TRUE, the default value) or as is (FALSE).

other.params

A character value specifying any additional parameters needed to call the.functions. For instance, if the.functions = "mean", and you would like to remove missing values, then specifying other.params = "na.rm = TRUE" as a character would suffice. Multiple parameters can be specified with comma separation, e.g. other.params = "trim = 1, na.rm = TRUE". Note that all of the parameters supplied must apply to all of the.functions

table.format

specify the format of the table depending on the desired output i.e. "long" or "wide"

add.function.name

A logical value specifying whether the name of the function applied should be appended to the column names in the resulting table. Only applies if the.functions is of length 1.

individual.variables

a logical variable specifying if variables are grouped or individual

output.as.table

a logical variable to specify if output should be a table or not

return.as

a character value specifying what output should be returned. return.as = "result" provides the table of counts. return.as = "code" provides a data.table coding statement that can generate the table of counts. return.as = "all" provides a list containing both the resulting table and the code.

envir

the environment in which the code would be evaluated; .GlobalEnv by default.

...

additional arguments to be passed

Value

Depending on the value of return.as, the output will be a) a character value (return.as = 'code'), b) a coding output, typically a data.table (return.as = 'result'), or c) a list containing both the code and output (return.as = 'all')

Source

DTwrappers::create.dt.statement

DTwrappers::eval.dt.statement

DTwrappers::add.backtick

Examples

n <- nrow(iris)
dat <- data.table::as.data.table(x = iris[sample(x = 1:n, size = n, replace = FALSE),])
dt.calculate(dt.name = "dat", the.variables = c("Sepal.Length"),
the.functions = c("mean", "sd"), return.as = "all")


[Package DTwrappers version 0.0.2 Index]