vqda {vdar} | R Documentation |
Weighted Quadratic Discriminant Analysis
Description
Extension of the qda() of package 'MASS' to calculate a QDA incorporating individual, cell-wise uncertainties, e.g. if the uncertainties are expressed as individual variances for each measurand.
Usage
vqda(x, uncertainties, grouping, prior)
Arguments
x |
data frame or matrix containing the data to be discriminated |
uncertainties |
data frame or matrix containing the values for uncertainties per cell. Uncertainties should be relative errors, e.g. the relative standard deviation of the measurand |
grouping |
a factor or character vector specifying the group for each observation (row). |
prior |
the prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels. |
Details
Uncertainties can be considered in a statistical analysis either by each measured variable, by each observation or by using the individual, cell-wise uncertainties. There are several methods for incorporating variable-wise or observation-wise uncertainties into a QDA, most of them using the uncertainties as weights for the variables or observations of the data set. The term 'cell-wise uncertainties' describe a data set of $d$ analysed variables where each observation has an individual uncertainty for each of the $d$ variables conforming it. Hence, a data set of $n \times d$ data values has associated a data set of $n \times d$ individual uncertainties. Instead of weighting the columns or rows of the data set, the vqda() function uses uncertainties to recalculate better estimates of the group variances and group means. If the presence of uncertainties is not accounted for, the decision rules are based on the group variances calculated by the given data set. But this observed group variance might deviate notably from the group variance, which can be estimated including the uncertainties. This methodological framework does not only allow to incorporate cell-wise uncertainties, but also would largely be valid if the information about the co-dependency between uncertainties within each observation would be reported.
Value
object of class 'vqda' containing the following components:
prior
the prior probabilities used.
counts
counts per group.
means
the group means.
generalizedMeans
the group means calculated by the function generalized_mean
groupVarCorrected
the group variances calculated by the function calc_estimate_true_var
lev
the levels of the grouping factor.
grouping
the factor specifying the class for each observation.
Author(s)
Solveig Pospiech, package 'MASS'
References
Pospiech, S., R. Tolosana-Delgado and K.G. van den Boogaart (2020) Discriminant Analysis for Compositional Data Incorporating Cell-Wise Uncertainties, Mathematical Geosciences
Examples
# for non-compositional data:
data("dataobs")
data("uncertainties")
myqda = vqda(x = dataobs[, 1:2], uncertainties = uncertainties[, 1:2], grouping = dataobs$Group)
mypred = predict(myqda, newdata = dataobs[, 1:2], newerror = uncertainties[, 1:2])
forplot = cbind(dataobs, LG1 = mypred$posterior[,1])
if (require("ggplot2")) {
scatter_plot = ggplot(data = forplot, aes(x = Var1, y = Var2)) +
geom_point(aes(shape = Group, color = LG1))
if (require("ggthemes")) {
scatter_plot = scatter_plot +
scale_color_gradientn(colours = colorblind_pal()(5))
}
scatter_plot
}
# for compositional data
data("dataobs_coda")
data("uncertainties_coda")
require(compositions)
# generate ilr-transformation (from package 'compositions')
data_ilr = ilr(dataobs_coda[, 1:3])
uncert_ilr = t(simplify2array(apply(uncertainties_coda[, 1:3],1,
function(Delta) clrvar2ilr(diag(Delta)))))
uncert_ilr = compositions::rmult(uncert_ilr) # change class into rmult from package 'compositions'
myqda_coda = vqda(x = data_ilr, uncertainties = uncert_ilr, grouping = dataobs_coda$Group)
mypred_coda = predict(myqda_coda, newdata = data_ilr, newerror = uncert_ilr)
forplot_coda = cbind(dataobs_coda, LG1 = mypred_coda$posterior[,1])
# if 'ggtern' is installed, you can plot via ggtern:
# if (require("ggtern")) {
# ternary_plot = ggtern(data = forplot_coda, aes(x = Var1, y = Var2, z = Var3)) +
# geom_point(aes(shape = Group, color = LG1))
# if (require("ggthemes")) {
# ternary_plot = ternary_plot +
# scale_color_gradientn(colours = colorblind_pal()(5))
# }
# ternary_plot
# }