dt.choose.cols {DTwrappers}R Documentation

dt.choose.cols

Description

This function selects columns from a data.frame or data.table. It is built as a wrapper function of data.table's selection step (using .SD in the j step while specifying the .SDcols argument). Selections may also be supplied to limit the rows to consider, with options for the first or last k rows or a subset based upon a vector like c(3:5, 9:10). Filtering of the rows (e.g. Age < 50) may also be applied using the.filter. Grouped operations may be used to make these selections of columns and rows in each category. Options also exist to return a data.table coding statement (result = "code") for educational purposes or both the result and the code together (result = "all"). For examples, please see the vignette.

Usage

dt.choose.cols(
  dt.name,
  the.variables = ".",
  the.filter = NULL,
  grouping.variables = NULL,
  sortby.group = TRUE,
  first.k = NULL,
  last.k = NULL,
  row.indices = NULL,
  return.as = "result",
  envir = .GlobalEnv
)

Arguments

dt.name

a character value specifying the name of a data.frame or data.table object to select data from. A variable called dat should be referred to with dt.name = "dat" when using the function.

the.variables

A character or numeric vector specifying the variables that we want to select. For character vectors, only values that exist in the names of the data will be used. For numeric vectors, only the values of unique(floor(sorting.variables)) that are in 1:ncol() of your data will be used. Then these indices will be used to select column names from the data. Only values that exist in the names of the data will be used; other values in the.variables will be excluded from the calculation. When the.variables includes ".", then all of the variables will be selected. Values of the.variables that also exist in grouping.variables will be excluded from the.variables (but grouped by these values).

the.filter

a character value, numeric vector, logical vector, or expression stating the logical operations used to filter the data. The filtering step will be applied prior to generating the counts. Defaults to NULL unless otherwise specified. Character values such as 'Age < 50' or 'c(1:3, 7:10)' may be used. Numeric vectors such as c(1:3, 7:10) that specify the row indices may be used. Logical vectors will be converted to a numeric filter, e.g. c(TRUE, TRUE, FALSE) will become 1:2 to signify which rows should be selected. Expressions may be used to specify a logical operation such as expression(Age < 50) as well. Defaults to NULL to indicate that no filtering of the data should be applied.

grouping.variables

A character or numeric vector specifying the variables to perform the calculations on. For character vectors, the values may be either column names of the data or calculations based upon them (see the vignette for examples). For numeric vectors, only the values of unique(floor(grouping.variables)) that are in 1:ncol() of your data will be used. Then these indices will be mapped to the corresponding column names from the data. When NULL, no grouping will be performed.

sortby.group

A character value specifying whether the grouping should be sorted (keyby) or as is (by). Defaults to keyby unless "by" is specified.

first.k

An integer indicating how many rows to select starting from the first row. Note that grouping statements will select up to this number of rows in each group. Additionally, if first.k is larger than the number of records in a group, then the maximum number of records will be selected. When non-integer or non-positive values of first.k are selected, the algorithm will select first.k = max(c(1, round(first.k))). If first.k is not a numeric or integer value, then by default first.k is set to select all of the rows. Specifying row.indices takes precedence to specifying the parameter first.k; if row.indices is not NULL, then row.indices will be used, and first.k will not. Meanwhile, first.k takes precedence to last.k when both are specified. See below.

last.k

An integer indicating how many rows to select starting from the last row. Note that grouping statements will select up to this number of rows in each group. Additionally, if last.k is larger than the number of records in a group, then the maximum number of records will be selected. When non-integer or non-positive values of last.k are selected, the algorithm will select last.k = max(c(1, round(last.k))). If last.k is not a numeric or integer value, then by default last.k is set to select all of the rows. Specifying row.indices takes precedence to specifying the parameter last.k (see below); if row.indices is not NULL, then it will be used, and last.k will not. Meanwhile, first.k takes precedence to last.k when both are specified.

row.indices

An integer vector specifying the row indices to return. When grouping.variables is specified, these indices will be applied to each group. Note that specifications outside of the range from 1 to the number of rows will be limited to existing rows from the data and group. Specifying row.indices takes precedence to specifying the parameters first.k and last.k. If row.indices is not NULL, it will be used.

return.as

a character value specifying what output should be returned. return.as = "result" provides the table of counts. return.as = "code" provides a data.table coding statement that can generate the table of counts. return.as = "all" provides a list containing both the resulting table and the code.

envir

the environment in which the code would be evaluated; .GlobalEnv by default.

Value

Depending on the value of return.as, the output will be a) a character value (return.as = 'code'), b) a coding output, typically a data.table (return.as = 'result'), or c) a list containing both the code and output (return.as = 'all')

Source

DTwrappers::create.dt.statement

DTwrappers::eval.dt.statement


[Package DTwrappers version 0.0.2 Index]