R: Correction.AdheringParticles

Correction.AdheringParticles {ratios}

R Documentation

Correction.AdheringParticles

Description

Suppose element data of one data set (DT1) are biased because the concentrations are the result of a mixture of two substances, of which one substance are the element concentrations of DT2. In order to correct DT1 to DT_{corrected} a fraction of DT2 has to be subtracted from DT1. The basic equation for the correction is:

DT_{corrected}=\frac{DT1 - x * DT2}{1 - x}

whereof x is the amount of DT2 to be subtracted.

The function is written for the case that x is unknown. To calculate x the condition is that in DT_{corrected} at least one element concentration is zero or known. Suppose vars_{i} has a very low concentration, close to zero, in DT_{corrected}: DT_{corrected}[vars_{i}]=0, then:

x = \frac{DT1[vars_{i}]}{DT2[vars_{i}]}

The function was developed for the use to correct plant concentrations for adhering particles: Exact and reproducible analysis of element concentrations in plant tissue is the basis for many research fields such as environmental, health, phytomining, agricultural or provenance studies. Unfortunately plant samples collected in the field will always contain particles on their tissue surfaces such as airborne dust or soil particles. If not removed these particles may induce a bias to the element concentrations measured in plant samples.

For full description of the calculations and the background of correction plants for adhering particles please refer to:

Pospiech, S., Fahlbusch, W., Sauer, B., Pasold, T., & Ruppert, H. (2017). Alteration of trace element concentrations in plants by adhering particles–Methods of correction. Chemosphere, 182, 501-508. and the section Details.

Usage

Correction.AdheringParticles(DT1, DT2 = NULL, vars = NULL,
  vars_ignore = c("As", "Se", "Sn", "V", "Be", "Ge", "Pt"), method, element,
  id.vars, group1.vars, group2.vars, var_subgroup, offset = 0,
  use_only_DT2 = TRUE, DT2_replace = NULL, Errors = TRUE,
  return_as_list = TRUE, negative_values = FALSE,
  set_statistical_0 = FALSE, Error_method = "gauss", STD_DT1 = STD_Plant,
  STD_DT2 = STD_Soil, minNr_DT1 = 100, minNr_DT2 = 100)

Arguments

`DT1`	data.frame or data.table, samples in rows and variables in columns
`DT2`	data.frame or data.table, samples in rows and variables in columns.
`vars`	optional, character vector of column names of DT1 and DT2, default is function `select.VarsElements`. Please make sure the columns given in `vars` are of class numeric.
`vars_ignore`	character vector of column names, only for 'method 3'. These variables are ignored for calculating the median of amount of DT2 (x) in 'method 3'. Please note: the functions returns corrected values for these columns because they are only ignored for calculating the median of x. Default is "As", "Se", "Sn", "V", "Be", "Ge" and "Pt". Please see Details for further explanation.
`method`	characters (no character vector!, please give m3 instead of "m3") denoting the method. Options are m1, m2 and m3 and subtr. Default is m3. Please see details.
`element`	string, only for method 1. Denotes the column with which amount of DT2 (x) is to be calculated.
`id.vars`	column with unique (!) entries for each row. Class can be integer (corresponding row numbers) or character (e.g. sample IDs). If missing, all columns but `vars` will be assigned to it. Please note: Function is faster and more stable if `id.vars` is provided.
`group1.vars`	character vector, column name(s) for subsetting DT1 and DT2
`group2.vars`	optional, column name for subsetting DT1 and DT2 if some entries in `group1.vars` are empty.
`var_subgroup`	optional, character vector of one column name of DT1. This option affects the only the error calculation, hence it is ignored if `Errors` is set to FALSE. If provided, DT1 is split into subsets by `group1.vars` and 'var_subgroup' and the error will calculated for each of these subset. Please read in the Details for further information.
`offset`	numeric, default is 0. The offset diminishes the subtracted amount of DT2 x: x = x - offset. If used with m2 all concentrations will stay > 0. Reasonable offset is e.g. offset = 0.0001
`use_only_DT2`	logical, default is FALSE. If there are not enough DT2 data of the location should the DT2s of the region be used? If the `use_only_DT2` is set to FALSE then the Upper Crust is used for the correction.
`DT2_replace`	optional, if a DT1 sample does not have DT2 data of the corresponding location with this option you can define which data you would like to use as DT2. Default is the build-in data set UpperCrust (geochemical composition of the earth's upper crust). If you would like to have something else, please provide a named vector/ one-row data.table with values used instead of DT2.
`Errors`	logical, should absolute errors get calculated appended to the list - output? Default is FALSE. If Errors are set to TRUE it overrides the option `return_as_list` and always returns a list.
`return_as_list`	logical, should the result get returned as list? Default is FALSE.
`negative_values`	logical, should negative values be returned? If set to FALSE negative values are set to 0. Default is FALSE.
`set_statistical_0`	logical, only for method 3. Should all values of the variables contributing to the median of x be set to 0? Default is FALSE.
`Error_method`	method with which the error should be calculated. At the moment you can choose between "gauss" (default) and "biggest". See Details for explanation.
`STD_DT1`	optional, data.frame or data.table object for calculating errors for DT1, e.g. the standards. Please see Details. If left empty a default of 5.2% relative error is used.
`STD_DT2`	optional, data.frame or data.table object for calculating errors for DT2, e.g. the standards. Please see Details. If left empty a default of 5.2% relative error is used.
`minNr_DT1`	minimum numbers of samples/observations in DT1 for calculating a relative error of observations. If the number of observations of DT1 is smaller than `minNr_DT1` the error is calculated via the data set `STD_DT1`. Default is 50.
`minNr_DT2`	minimum numbers of samples/observations in DT2 for calculating a relative error of observations. If the number of observations of DT1 is smaller than `minNr_DT2` the error is calculated via the data set `STD_DT2`. Default is 50.

Details

The main option of this function is the method which determines how the amount of DT2 to be subtracted, the x, is going to be calculated. There are four options:

Method 1: calculate x via a fixed element
Method 2: calculate x via the element with the smallest ratio between DT1[vars] and DT2[vars]
Method 3: calculate x via the median of several, very small ratios between DT1[vars] and DT2[vars]
Method subtr: calculate the concentrations for x * DT2[vars]

To Method 1: For example using Ti as element DT_{corrected} is calculated with x = DT1[Ti]/DT2[Ti]. Typical elements for the option element are e.g. Ti, Al, Zr, Sc, ... This will eventually lead to negative concentrations for some elements.

To Method 2: This method subtracts the smallest possible content of DT2 from DT1 (smallest x). For each row/sample the element with the smallest x of all ratios x = DT1[vars]/DT2[vars] of each sample is taken as element, hence every sample is corrected based on a different element. With this method there are no negative concentrations.

To Method 3: In order to reduce the uncertainty of the content of DT2 in DT1 (x) based on only one element as in method 1 and 2 an average of several x of elements can be calculated. With \Delta x being the absolute error of x the median is calculated by all x of elements which values x - \Delta x are smaller than x_{smallest} + \Delta x_{smallest}. The value of the median \bar{x} is then used as x. This will eventually lead to negative concentrations for some elements. Because statistically the x of all elements, which error overlaps the error of the element with smallest x, are indistinguishable we suggest to set all elements contributing to \bar{x} to zero, because these small values should not be interpreted: Set option set_statistical_0 to TRUE.

It is advisable to exclude elements with a huge error margin in the option vars_ignore because they could severely increase the median \bar{x} by "opening" the window of error-ranges for many elements with significantly higher ratios. This could lead to an unnatural high median \bar{x} resulting into an overcorrection.

If option id.vars is provided the functions prints the 'group1.vars' and 'id.vars' of the sample.

For examples and more information please refer to: Pospiech, S., Fahlbusch, W., Sauer, B., Pasold, T., & Ruppert, H. (2017). Alteration of trace element concentrations in plants by adhering particles–Methods of correction. Chemosphere, 182, 501-508.

Value

data.frame (or data.table if DT1 is data.table) according to method.

Author(s)

Solveig Pospiech