classif.lda {MorphoTools2} | R Documentation |
Classificatory Discriminant Analysis
Description
These functions computes discriminant function for classifying observations. Linear discriminant function (classif.lda
), quadratic discriminant function (classif.qda
), or nonparametric k-nearest neighbours classification method (classif.knn
) can be used.
Usage
classif.lda(object, crossval = "indiv")
classif.qda(object, crossval = "indiv")
classif.knn(object, k, crossval = "indiv")
Arguments
object |
an object of class |
crossval |
crossvalidation mode, sets individual ( |
k |
number of neighbours considered for the k-nearest neighbours method. |
Details
The classif.lda
and classif.qda
performs classification using linear and quadratic discriminant functions with cross-validation using the lda
and qda
functions from the package MASS
. The prior probabilities of group memberships are equal.
LDA and QDA analyses have some requirements: (1) no character can be a linear combination of any other character; (2) no pair of characters can be highly correlated; (3) no character can be invariant in any taxon; (4) for the number of taxa (g), characters (p) and total number of samples (n) should hold: 0 <
p <
(n - g), and (5) there must be at least two groups (taxa), and in each group there must be at least two objects. Violation of some of these assumptions may result in warnings or error messages (rank deficiency).
Nonparametric classification method k-nearest neighbours is performed using the knn
and knn.cv
functions from the package class
.
The mode of crossvalidation is set by the parameter crossval
. The default "indiv"
uses the standard one-leave-out method. However, as some hierarchical structure is usually present in the data (individuals from a population are not completely independent observations, as they are morphologically closer to each other than to individuals from other populations), the value "pop"
sets whole populations as leave-out units. The latter method does not allow classification if there is only one population for a taxon and is more sensitive to “atypical” populations, which usually leads to a somewhat lower classification success rate.
The coefficients of the linear discriminant functions (above) can be directly applied to classify individuals of unknown group membership. The sums of constant and multiples of each character by the corresponding coefficient are compared among the groups. The unknown individual is classified into the group that shows the higher score. If the populations leave-out cross-validation mode is selected (crossval = "pop"
): (1) each taxon must be represented by at least two populations; (2) coefficients of classification functions are computed as averages of coefficients retrieved after each run with one population removed.
Value
an object of class classifdata
with the following elements:
ID |
IDs of each row. |
Population |
population membership of each row. |
Taxon |
taxon membership of each row. |
classif.funs |
the classification functions computed for raw characters (descriptors). If |
classif |
classification from discriminant analysis. |
prob |
posterior probabilities of classification into each taxon (if calculated by |
correct |
logical, correctness of classification. |
See Also
classifSample.lda
,
classif.matrix
,
knn.select
Examples
data(centaurea)
# remove NAs and linearly dependent characters (characters with unique contributions
# can be identified by stepwise discriminant analysis.)
centaurea = naMeanSubst(centaurea)
centaurea = removePopulation(centaurea, populationName = c("LIP", "PREL"))
centaurea = keepCharacter(centaurea, c("MLW", "ML", "IW", "LS", "IV", "MW", "MF",
"AP", "IS", "LBA", "LW", "AL", "ILW", "LBS",
"SFT", "CG", "IL", "LM", "ALW", "AW", "SF") )
# add a small constant to characters witch are invariant within taxa
centaurea$data[ centaurea$Taxon == "hybr", "LM" ][1] =
centaurea$data[ centaurea$Taxon == "hybr", "LM" ][1] + 0.000001
centaurea$data[ centaurea$Taxon == "ph", "IV" ][1] =
centaurea$data[ centaurea$Taxon == "ph", "IV" ][1] + 0.000001
centaurea$data[ centaurea$Taxon == "st", "LBS"][1] =
centaurea$data[ centaurea$Taxon == "st", "LBS"][1] + 0.000001
# classification by linear discriminant function
classifRes.lda = classif.lda(centaurea, crossval = "indiv")
# classification by quadratic discriminant function
classifRes.qda = classif.qda(centaurea, crossval = "indiv")
# classification by nonparametric k-nearest neighbour method
# use knn.select to find the optimal K.
knn.select(centaurea, crossval = "pop")
classifRes.knn = classif.knn(centaurea, k = 12, crossval = "pop")
# exporting results
classif.matrix(classifRes.lda, level = "taxon")
classif.matrix(classifRes.qda, level = "taxon")
classif.matrix(classifRes.knn, level = "taxon")