ratioDT {ratios} | R Documentation |
ratioDT
Description
The function calculates ratios of corresponding variables and corresponding rows between two data sets, DT1 and DT2.
The result is a data set with the same dimensions as DT1.
The variables can be specified by vars
, without specification the subfunction select.VarsElements
matches column names with element abbreviations.
Which row of DT1 corresponds to which row in DT2 has to be specified by the variable(s) group1.vars
(and optional group2.vars
).
If DT2 has different number of rows than DT1 a 'new DT2' with equal dimensions to DT1 is prepared by the function preparationDT2
.
At the moment there are three different options for calculating the ratios:
"simple"
"log"
"ar"
"alr"
"cr"
"clr"
For more details please refer to preparationDT2
and section Details.
Usage
ratioDT(DT1, DT2, vars = NULL, group1.vars, group2.vars = NULL,
ratio_type = "simple", vars.ref, id.vars, Errors = FALSE,
Error_method = "gauss", var_subgroup = NULL, use_only_DT2 = FALSE,
DT2_replace = NULL, STD_DT1, STD_DT2, minNr_DT1 = 50, minNr_DT2 = 50,
return_all = FALSE, return_as_list = FALSE)
Arguments
DT1 |
data.frame or data.table, samples in rows and variables in columns |
DT2 |
data.frame or data.table, samples in rows and variables in columns. |
vars |
optional, character vector of column names of DT1 and DT2, default is function |
group1.vars |
character vector, column name(s) for subsetting DT1 and DT2 |
group2.vars |
optional, column name for subsetting DT1 and DT2 if some entries in |
ratio_type |
character vector of "simple", "log", "ar", "alr", "cr" and "clr". Please refer to details for explanations. |
vars.ref |
reference variable, one out of |
id.vars |
column with unique (!) entries for each row. Class can be integer (corresponding row numbers) or character (e.g. sample IDs).
If missing, all columns but |
Errors |
logical, should absolute errors get calculated appended to the list - output? Default is FALSE.
If Errors are set to TRUE it overrides the option |
Error_method |
method with which the error should be calculated. At the moment you can choose between "gauss" (default) and "biggest". See Details for explanation. |
var_subgroup |
optional, character vector of one column name of DT1. This option affects the only the error calculation, hence it is ignored if |
use_only_DT2 |
logical, default is FALSE. If there are not enough DT2 data of the location should the DT2s of the region be used? If the |
DT2_replace |
mandatory if |
STD_DT1 |
optional, data.frame or data.table object for calculating errors for DT1, e.g. the standards. Please see Details. If left empty a default of 5.2% relative error is used. |
STD_DT2 |
optional, data.frame or data.table object for calculating errors for DT2, e.g. the standards. Please see Details. If left empty a default of 5.2% relative error is used. |
minNr_DT1 |
minimum numbers of samples/observations in DT1 for calculating a relative error of observations.
If the number of observations of DT1 is smaller than |
minNr_DT2 |
minimum numbers of samples/observations in DT2 for calculating a relative error of observations.
If the number of observations of DT1 is smaller than |
return_all |
logical, should all used data sets be returned as a list? Default is FALSE. If set to TRUE the list contains DT1, DT2, vars, ratios, and optional additional ratios_error, DT1_error and DT2_error. |
return_as_list |
logical, should the result get returned as list? Default is FALSE.
If set to FALSE and |
Details
To calculate the ratios the functions internally calls preparationDT2
to create a data set 'new DT2' from the variables vars
of DT2, which has equal number of rows to DT1.
Then the division is done by the now corresponding data sets by the method given in 'ratio_type'.
The method "simple" is a simple division between DT1 and DT2:
\frac{DT1[vars]}{DT2[vars]}
The method "log" is the logarithm of the simple ratio:
ln \left( \frac{DT1[vars]}{DT2[vars]} \right)
The methods "ar" and "alr" normalize all ratios to one reference column: ar:
\frac{DT1[vars_{i}]}{DT2[vars_{i}]} * \frac{DT2[vars_n]}{DT1[vars_n]}_{i=1,\dots, n, \dots, D}
alr:
ln \left(\frac{DT1[vars_{i}]}{DT2[vars_{i}]} * \frac{DT2[vars_n]}{DT1[vars_n]}\right)_{i=1,\dots, n, \dots, D}
The methods "cr" and "clr" normalize all ratios to the geometric mean of all columns included by vars
:
"cr" is calculated by:
\frac{DT1[vars_{i}]}{DT2[vars_{i}]} * \frac{g(x)^{DT2[vars]}}{g(x)^{DT1[vars]}}_{i=1,\dots, D}
whereof the function g(x) stands for:
g(x) = \sqrt[D]{DT[vars_1] \cdot DT[vars_2] \cdots DT[vars_D]}
and "clr" is calculated by:
ln \left(\frac{DT1[vars_{i}]}{DT2[vars_{i}]} * \frac{g(x)^{DT2[vars]}}{g(x)^{DT1[vars]}}\right)_{i=1,\dots, D}
The methods "clr" and "alr" should be considered if the data contain so called compositional data as defined by Aitchison, J. (1986): "The statistical analysis of compositional data".
They names correspond to the names used in the package compositions
by K. Gerald van den Boogaart, Raimon Tolosana and Matevz Bren.
Calculating the absolute error for the ratios requires calculating the absolute errors of DT1 and DT2, too.
For calculating the errors of DT1 and DT2 the function relError_dataset
is used.
Accordingly the options for STD_DT1
and STD_DT2
are passed to the option STD
in relError_dataset
.
If STD_DT1 and/or STD_DT2 are left empty the default of 5.2% relative error is used.
Also the options minNr_DT1
and minNr_DT2
are passed to the option minNr
in relError_dataset
.
The Error_method
determines how the absolute error of the ratios is calculated.
The error method "gauss" refers to the error propagation after Gauss:
\Delta x = \frac{\Delta DT1}{DT2} - DT1 * \frac{\Delta DT2}{DT2^2}
The error method "biggest" refers to the maximum error after Gauss:
\Delta x = \frac{\Delta DT1}{DT2} + DT1 * \frac{\Delta DT2}{DT2^2}
For example:
If you have in DT1 plant samples with group1.vars = "Location"
the error function would calculate the relative standard deviation for all plants of one location.
But maybe you have very different plants in one location so setting var_subgroup = "Species"
the error function will calculate the relative standard deviation for each plant species per location, if there are more species per location than given in minNr_DT1
.
Suppose DT2 are soil data with several samples per location.
If group1.vars = "Location"
than the function calls preparationDT2
and calculates a mean for each location from the data set.
The ratio from plant to soil and the absolute errors of the ratios is then calculated for each plant sample to a mean of soils from one location.
Value
The function returns either a data.table, data.frame or a list controlled by the option return_as_list
.
If return_as_list
to FALSE a data.frame (or data.table if DT1 is of class data.table) is returned.
If option Errors
is set to TRUE ratios and error are combined into one object and a column type_of_data
is appended with the entries ratio and ratio_error respectively.
If return_as_list
to TRUE the DT1-DT2-ratios are named in the list as "ratios" and, if Errors
is set to TRUE the absolute errors of the ratios are saved in the list as "ratios_error".
If 'return_all' is set to TRUE a list with the following entries will be returned:
[[1]] "DT1", [[2]] "DT2", [[3]] "vars", [[4]] "ratios" and if Errors
is set to TRUE additionally [[5]] "ratios_error", [[6]] "DT1_error", [[7]] "DT2_error".
Author(s)
Solveig Pospiech
See Also
Other ratio functions: Correction.AdheringParticles
,
preparationDT2
,
ratio_append_smallest