R: Classificatory Discriminant Analysis

classif.lda {MorphoTools2}

R Documentation

Classificatory Discriminant Analysis

Description

These functions computes discriminant function for classifying observations. Linear discriminant function (classif.lda), quadratic discriminant function (classif.qda), or nonparametric k-nearest neighbours classification method (classif.knn) can be used.

Usage

classif.lda(object, crossval = "indiv")

classif.qda(object, crossval = "indiv")

classif.knn(object, k, crossval = "indiv")

Arguments

`object`	an object of class `morphodata`.
`crossval`	crossvalidation mode, sets individual (`"indiv"`; default, one-leave-out method) or whole populations (`"pop"`) as leave-out unit.
`k`	number of neighbours considered for the k-nearest neighbours method.

Details

The classif.lda and classif.qda performs classification using linear and quadratic discriminant functions with cross-validation using the lda and qda functions from the package MASS. The prior probabilities of group memberships are equal.

LDA and QDA analyses have some requirements: (1) no character can be a linear combination of any other character; (2) no pair of characters can be highly correlated; (3) no character can be invariant in any taxon; (4) for the number of taxa (g), characters (p) and total number of samples (n) should hold: 0 < p < (n - g), and (5) there must be at least two groups (taxa), and in each group there must be at least two objects. Violation of some of these assumptions may result in warnings or error messages (rank deficiency).

Nonparametric classification method k-nearest neighbours is performed using the knn and knn.cv functions from the package class.

The mode of crossvalidation is set by the parameter crossval. The default "indiv" uses the standard one-leave-out method. However, as some hierarchical structure is usually present in the data (individuals from a population are not completely independent observations, as they are morphologically closer to each other than to individuals from other populations), the value "pop" sets whole populations as leave-out units. The latter method does not allow classification if there is only one population for a taxon and is more sensitive to “atypical” populations, which usually leads to a somewhat lower classification success rate.

The coefficients of the linear discriminant functions (above) can be directly applied to classify individuals of unknown group membership. The sums of constant and multiples of each character by the corresponding coefficient are compared among the groups. The unknown individual is classified into the group that shows the higher score. If the populations leave-out cross-validation mode is selected (crossval = "pop"): (1) each taxon must be represented by at least two populations; (2) coefficients of classification functions are computed as averages of coefficients retrieved after each run with one population removed.

Value

an object of class classifdata with the following elements:

`ID`	IDs of each row.
`Population`	population membership of each row.
`Taxon`	taxon membership of each row.
`classif.funs`	the classification functions computed for raw characters (descriptors). If `crossval = "pop"`, means of coefficients of classification functions are computed.
`classif`	classification from discriminant analysis.
`prob`	posterior probabilities of classification into each taxon (if calculated by `classif.lda` or `classif.qda`), or proportion of the votes for the winning class (calculated by `classif.knn`)
`correct`	logical, correctness of classification.

Examples

data(centaurea)

# remove NAs and linearly dependent characters (characters with unique contributions
#                  can be identified by stepwise discriminant analysis.)
centaurea = naMeanSubst(centaurea)
centaurea = removePopulation(centaurea, populationName = c("LIP", "PREL"))
centaurea = keepCharacter(centaurea, c("MLW", "ML", "IW", "LS", "IV", "MW", "MF",
                                    "AP", "IS", "LBA", "LW", "AL", "ILW", "LBS",
                                    "SFT", "CG", "IL", "LM", "ALW", "AW", "SF") )
# add a small constant to characters witch are invariant within taxa
centaurea$data[ centaurea$Taxon == "hybr", "LM" ][1] =
             centaurea$data[ centaurea$Taxon == "hybr", "LM" ][1] + 0.000001
centaurea$data[ centaurea$Taxon == "ph", "IV" ][1] =
             centaurea$data[ centaurea$Taxon == "ph", "IV" ][1] + 0.000001
centaurea$data[ centaurea$Taxon == "st", "LBS"][1] =
             centaurea$data[ centaurea$Taxon == "st", "LBS"][1] + 0.000001



# classification by linear discriminant function
classifRes.lda = classif.lda(centaurea, crossval = "indiv")

# classification by quadratic discriminant function
classifRes.qda = classif.qda(centaurea, crossval = "indiv")

# classification by nonparametric k-nearest neighbour method
# use knn.select to find the optimal K.
knn.select(centaurea, crossval = "pop")
classifRes.knn = classif.knn(centaurea, k = 12, crossval = "pop")

# exporting results
classif.matrix(classifRes.lda, level = "taxon")
classif.matrix(classifRes.qda, level = "taxon")
classif.matrix(classifRes.knn, level = "taxon")

[Package MorphoTools2 version 1.0.1.1 Index]