vlda {vdar}R Documentation

Weighted Linear Discriminant Analysis

Description

Extension of the qda() of package 'MASS' (not the lda() function) to calculate a LDA incorporating individual, cell-wise uncertainties, e.g. if the uncertainties are expressed as individual variances for each measurand.

Usage

vlda(x, uncertainties, grouping, prior)

Arguments

x

frame or matrix containing the data to be discriminated

uncertainties

data frame or matrix containing the values for uncertainties per cell. Uncertainties should be relative errors, e.g. the relative standard deviation of the measurand

grouping

a factor or character vector specifying the group for each observation (row).

prior

the prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels.

Details

Uncertainties can be considered in a statistical analysis either by each measured variable, by each observation or by using the individual, cell-wise uncertainties. There are several methods for incorporating variable-wise or observation-wise uncertainties into a QDA, most of them using the uncertainties as weights for the variables or observations of the data set. The term 'cell-wise uncertainties' describe a data set of $d$ analysed variables where each observation has an individual uncertainty for each of the $d$ variables conforming it. Hence, a data set of $n \times d$ data values has associated a data set of $n \times d$ individual uncertainties. Instead of weighting the columns or rows of the data set, the vlda() function uses uncertainties to recalculate better estimates of the group variances and group means. It is internally very similar to the vqda function, but with an averaged group variance for all groups. If the presence of uncertainties is not accounted for, the decision rules are based on the group variances calculated by the given data set. But this observed group variance might deviate notably from the group variance, which can be estimated including the uncertainties. This methodological framework does not only allow to incorporate cell-wise uncertainties, but also would largely be valid if the information about the co-dependency between uncertainties within each observation would be reported.

Value

object of class 'vlda' containing the following components: prior the prior probabilities used. counts counts per group. means the group means. generalizedMeans the group means calculated by the function generalized_mean groupVarCorrected the group variances calculated by the function calc_estimate_true_var lev the levels of the grouping factor. grouping the factor specifying the class for each observation.

Author(s)

Solveig Pospiech, package 'MASS'

References

Pospiech, S., R. Tolosana-Delgado and K.G. van den Boogaart (2020) Discriminant Analysis for Compositional Data Incorporating Cell-Wise Uncertainties, Mathematical Geosciences

Examples

# for non-compositional data:
data("dataobs")
data("uncertainties")
mylda = vlda(x = dataobs[, 1:2], uncertainties = uncertainties[, 1:2], grouping = dataobs$Group)
mypred = predict(mylda, newdata = dataobs[, 1:2], newerror = uncertainties[, 1:2])
forplot = cbind(dataobs, LG1 = mypred$posterior[,1])
if (require("ggplot2")) {
  scatter_plot = ggplot(data = forplot, aes(x = Var1, y = Var2)) +
    geom_point(aes(shape = Group, color = LG1))
  if (require("ggthemes")) {
    scatter_plot = scatter_plot +
        scale_color_gradientn(colours = colorblind_pal()(5))
  }
  scatter_plot
}

# for compositional data
data("dataobs_coda")
data("uncertainties_coda")
require(compositions)
# generate ilr-transformation (from package 'compositions')
data_ilr = ilr(dataobs_coda[, 1:3])
uncert_ilr = t(simplify2array(apply(uncertainties_coda[, 1:3],1,
                       function(Delta) clrvar2ilr(diag(Delta)))))
uncert_ilr = compositions::rmult(uncert_ilr) # change class into rmult from package 'compositions'
mylda_coda = vlda(x = data_ilr, uncertainties = uncert_ilr, grouping = dataobs_coda$Group)
mypred_coda = predict(mylda_coda, newdata = data_ilr, newerror = uncert_ilr)
forplot_coda = cbind(dataobs_coda, LG1 = mypred_coda$posterior[,1])
# if 'ggtern' is installed, you can plot via ggtern:
# if (require("ggtern")) {
#   ternary_plot = ggtern(data = forplot_coda, aes(x = Var1, y = Var2, z = Var3)) +
#     geom_point(aes(shape = Group, color = LG1))
#   if (require("ggthemes")) {
#     ternary_plot = ternary_plot +
#         scale_color_gradientn(colours = colorblind_pal()(5))
#   }
#   ternary_plot
# }


[Package vdar version 0.1.3-2 Index]