R: ratioDT

ratioDT {ratios}

R Documentation

ratioDT

Description

The function calculates ratios of corresponding variables and corresponding rows between two data sets, DT1 and DT2. The result is a data set with the same dimensions as DT1. The variables can be specified by vars, without specification the subfunction select.VarsElements matches column names with element abbreviations. Which row of DT1 corresponds to which row in DT2 has to be specified by the variable(s) group1.vars (and optional group2.vars). If DT2 has different number of rows than DT1 a 'new DT2' with equal dimensions to DT1 is prepared by the function preparationDT2. At the moment there are three different options for calculating the ratios:

"simple"
"log"
"ar"
"alr"
"cr"
"clr"

For more details please refer to preparationDT2 and section Details.

Usage

ratioDT(DT1, DT2, vars = NULL, group1.vars, group2.vars = NULL,
  ratio_type = "simple", vars.ref, id.vars, Errors = FALSE,
  Error_method = "gauss", var_subgroup = NULL, use_only_DT2 = FALSE,
  DT2_replace = NULL, STD_DT1, STD_DT2, minNr_DT1 = 50, minNr_DT2 = 50,
  return_all = FALSE, return_as_list = FALSE)

Arguments

`DT1`	data.frame or data.table, samples in rows and variables in columns
`DT2`	data.frame or data.table, samples in rows and variables in columns.
`vars`	optional, character vector of column names of DT1 and DT2, default is function `select.VarsElements`. Please make sure the columns given in `vars` are of class numeric.
`group1.vars`	character vector, column name(s) for subsetting DT1 and DT2
`group2.vars`	optional, column name for subsetting DT1 and DT2 if some entries in `group1.vars` are empty.
`ratio_type`	character vector of "simple", "log", "ar", "alr", "cr" and "clr". Please refer to details for explanations.
`vars.ref`	reference variable, one out of `vars`. Only for `ratio_type` "ar" or "alr".
`id.vars`	column with unique (!) entries for each row. Class can be integer (corresponding row numbers) or character (e.g. sample IDs). If missing, all columns but `vars` will be assigned to it. Please note: Function is faster and more stable if `id.vars` is provided.
`Errors`	logical, should absolute errors get calculated appended to the list - output? Default is FALSE. If Errors are set to TRUE it overrides the option `return_as_list` and always returns a list.
`Error_method`	method with which the error should be calculated. At the moment you can choose between "gauss" (default) and "biggest". See Details for explanation.
`var_subgroup`	optional, character vector of one column name of DT1. This option affects the only the error calculation, hence it is ignored if `Errors` is set to FALSE. If provided, DT1 is split into subsets by `group1.vars` and 'var_subgroup' and the error will calculated for each of these subset. Please read in the Details for further information.
`use_only_DT2`	logical, default is FALSE. If there are not enough DT2 data of the location should the DT2s of the region be used? If the `use_only_DT2` is set to FALSE then the Upper Crust is used for the correction.
`DT2_replace`	mandatory if `use_only_DT2` is set to FALSE, serves as substitute for DT2 where DT2 has no corresponding rows to DT1. A named vector or one-row data.table/ data.frame with the all `vars` present. A column for `group1.vars` is not necessary.
`STD_DT1`	optional, data.frame or data.table object for calculating errors for DT1, e.g. the standards. Please see Details. If left empty a default of 5.2% relative error is used.
`STD_DT2`	optional, data.frame or data.table object for calculating errors for DT2, e.g. the standards. Please see Details. If left empty a default of 5.2% relative error is used.
`minNr_DT1`	minimum numbers of samples/observations in DT1 for calculating a relative error of observations. If the number of observations of DT1 is smaller than `minNr_DT1` the error is calculated via the data set `STD_DT1`. Default is 50.
`minNr_DT2`	minimum numbers of samples/observations in DT2 for calculating a relative error of observations. If the number of observations of DT1 is smaller than `minNr_DT2` the error is calculated via the data set `STD_DT2`. Default is 50.
`return_all`	logical, should all used data sets be returned as a list? Default is FALSE. If set to TRUE the list contains DT1, DT2, vars, ratios, and optional additional ratios_error, DT1_error and DT2_error.
`return_as_list`	logical, should the result get returned as list? Default is FALSE. If set to FALSE and `Errors` is set to TRUE a column `type_of_data` is appended. This option is ignored if option 'return_all' is set to TRUE.

Details

To calculate the ratios the functions internally calls preparationDT2 to create a data set 'new DT2' from the variables vars of DT2, which has equal number of rows to DT1. Then the division is done by the now corresponding data sets by the method given in 'ratio_type'.

The method "simple" is a simple division between DT1 and DT2:

\frac{DT1[vars]}{DT2[vars]}

The method "log" is the logarithm of the simple ratio:

ln \left( \frac{DT1[vars]}{DT2[vars]} \right)

The methods "ar" and "alr" normalize all ratios to one reference column: ar:

\frac{DT1[vars_{i}]}{DT2[vars_{i}]} * \frac{DT2[vars_n]}{DT1[vars_n]}_{i=1,\dots, n, \dots, D}

alr:

ln \left(\frac{DT1[vars_{i}]}{DT2[vars_{i}]} * \frac{DT2[vars_n]}{DT1[vars_n]}\right)_{i=1,\dots, n, \dots, D}

The methods "cr" and "clr" normalize all ratios to the geometric mean of all columns included by vars: "cr" is calculated by:

\frac{DT1[vars_{i}]}{DT2[vars_{i}]} * \frac{g(x)^{DT2[vars]}}{g(x)^{DT1[vars]}}_{i=1,\dots, D}

whereof the function g(x) stands for:

g(x) = \sqrt[D]{DT[vars_1] \cdot DT[vars_2] \cdots DT[vars_D]}

and "clr" is calculated by:

ln \left(\frac{DT1[vars_{i}]}{DT2[vars_{i}]} * \frac{g(x)^{DT2[vars]}}{g(x)^{DT1[vars]}}\right)_{i=1,\dots, D}

The methods "clr" and "alr" should be considered if the data contain so called compositional data as defined by Aitchison, J. (1986): "The statistical analysis of compositional data". They names correspond to the names used in the package compositions by K. Gerald van den Boogaart, Raimon Tolosana and Matevz Bren.

Calculating the absolute error for the ratios requires calculating the absolute errors of DT1 and DT2, too. For calculating the errors of DT1 and DT2 the function relError_dataset is used. Accordingly the options for STD_DT1 and STD_DT2 are passed to the option STD in relError_dataset. If STD_DT1 and/or STD_DT2 are left empty the default of 5.2% relative error is used. Also the options minNr_DT1 and minNr_DT2 are passed to the option minNr in relError_dataset.

The Error_method determines how the absolute error of the ratios is calculated. The error method "gauss" refers to the error propagation after Gauss:

\Delta x = \frac{\Delta DT1}{DT2} - DT1 * \frac{\Delta DT2}{DT2^2}

The error method "biggest" refers to the maximum error after Gauss:

\Delta x = \frac{\Delta DT1}{DT2} + DT1 * \frac{\Delta DT2}{DT2^2}

For example: If you have in DT1 plant samples with group1.vars = "Location" the error function would calculate the relative standard deviation for all plants of one location. But maybe you have very different plants in one location so setting var_subgroup = "Species" the error function will calculate the relative standard deviation for each plant species per location, if there are more species per location than given in minNr_DT1. Suppose DT2 are soil data with several samples per location. If group1.vars = "Location" than the function calls preparationDT2 and calculates a mean for each location from the data set. The ratio from plant to soil and the absolute errors of the ratios is then calculated for each plant sample to a mean of soils from one location.

Value

The function returns either a data.table, data.frame or a list controlled by the option return_as_list. If return_as_list to FALSE a data.frame (or data.table if DT1 is of class data.table) is returned. If option Errors is set to TRUE ratios and error are combined into one object and a column type_of_data is appended with the entries ratio and ratio_error respectively. If return_as_list to TRUE the DT1-DT2-ratios are named in the list as "ratios" and, if Errors is set to TRUE the absolute errors of the ratios are saved in the list as "ratios_error". If 'return_all' is set to TRUE a list with the following entries will be returned:

[[1]] "DT1", [[2]] "DT2", [[3]] "vars", [[4]] "ratios" and if Errors is set to TRUE additionally [[5]] "ratios_error", [[6]] "DT1_error", [[7]] "DT2_error".

Author(s)

Solveig Pospiech