fdiscd.misclass {dad}R Documentation

Misclassification ratio in functional discriminant analysis of probability densities.

Description

Computes the one-leave-out misclassification ratio of the rule assigning TT groups of individuals, one group after another, to the class of groups (among KK classes of groups) which achieves the minimum of the distances or divergences between the density function associated to the group to assign and the KK density functions associated to the KK classes.

Usage

fdiscd.misclass(xf, class.var, gaussiand = TRUE,
           distance =  c("jeffreys", "hellinger", "wasserstein", "l2", "l2norm"),
           crit = 1, windowh = NULL)

Arguments

xf

object of class folderh with two data frames:

  • The first one has at least two columns. One column contains the names of the TT groups (all the names must be different). An other column is a factor with KK levels partitionning the T groups into K classes.

  • The second one has (p+1)(p+1) columns. The first pp columns are numeric (otherwise, there is an error). The last column is a factor with TT levels defining TT groups. Each group, say tt, consists of ntn_t individuals.

class.var

string. The name of the class variable.

distance

The distance or dissimilarity used to compute the distance matrix between the densities. It can be:

  • "jeffreys" (default) the Jeffreys measure (symmetrised Kullback-Leibler divergence),

  • "hellinger" the Hellinger (Matusita) distance,

  • "wasserstein" the Wasserstein distance,

  • "l2" the L2L^2 distance,

  • "l2norm" (only available when crit = 1) the densities are normed and the L2L^2 distance between these normed densities is used;

If gaussiand = FALSE, the densities are estimated by the Gaussian kernel method and the distance is "l2" or "l2norm".

crit

1, 2 or 3. In order to select the densities associated to the classes. See Details.

If distance is "hellinger", "jeffreys" or "wasserstein", crit is necessarily 1 (see Details).

gaussiand

logical. If TRUE (default), the probability densities are supposed Gaussian. If FALSE, densities are estimated using the Gaussian kernel method.

If distance is "hellinger", "jeffreys" or "wasserstein", gaussiand is necessarily TRUE.

windowh

strictly positive numeric value. If windowh = NULL (default), the bandwidths are computed using the bandwidth.parameter function.

Omitted when distance is "hellinger", "jeffreys" or "wasserstein" (see Details).

Details

The TT probability densities ftf_t corresponding to the TT groups of individuals are either parametrically estimated (gaussiand = TRUE) or estimated using the Gaussian kernel method (gaussiand = FALSE). In the latter case, the windowh argument provides the list of the bandwidths to be used. Notice that in the multivariate case (pp>1), the bandwidths are positive-definite matrices.

The argument windowh is a numerical value, the matrix bandwidth is of the form hSh S, where SS is either the square root of the covariance matrix (pp>1) or the standard deviation of the estimated density.

If windowh = NULL (default), hh in the above formula is computed using the bandwidth.parameter function.

To the class kk consisting of TkT_k groups is associated the density denoted gkg_k. The crit argument selects the estimation method of the KK densities gkg_k.

  1. The density gkg_k is estimated using the whole data of this class, that is the rows of x corresponding to the TkT_k groups of the class kk.

    The estimation of the densities gkg_k uses the same method as the estimation of the ftf_t.

  2. The TkT_k densities ftf_t are estimated using the corresponding data from x. Then they are averaged to obtain an estimation of the density gkg_k, that is gk=1Tkftg_k = \frac{1}{T_k} \, \sum{f_t}.

  3. Each previous density ftf_t is weighted by ntn_t (the number of rows of xx corresponding to ftf_t). Then they are averaged, that is gk=1ntntftg_k = \frac{1}{\sum n_t} \sum n_t f_t.

The last two methods are only available for the L2L^2-distance. If the divergences between densities are computed using the Hellinger or Wasserstein distance or Jeffreys measure, only the first of these methods is available.

The distance or dissimilarity between the estimated densities is either the L2L^2 distance, the Hellinger distance, Jeffreys measure (symmetrised Kullback-Leibler divergence) or the Wasserstein distance.

Value

Returns an object of class fdiscd.misclass, that is a list including:

classification

data frame with 4 columns:

  • factor giving the group name. The column name is the same as that of the column (p+1p+1) of x,

  • the prior class of the group if it is available, or NA if not,

  • alloc: the class allocation computed by the discriminant analysis method,

  • misclassed: boolean. TRUE if the group is misclassed, FALSE if it is well-classed, NA if the prior class of the group is unknown.

confusion.mat

confusion matrix,

misalloc.per.class

the misclassification ratio per class,

misclassed

the misclassification ratio,

distances

matrix with TT rows and KK columns, of the distances (dtkd_{tk}): dtkd_{tk} is the distance between the group tt and the class kk, computed with the measure given by argument distance (L2L^2-distance, Hellinger distance or Jeffreys measure),

proximities

matrix of the proximity indices (in percents) between the groups and the classes. The proximity of the group tt to the class kk is computed as so: (1/dtk)/l=1l=K(1/dtl)(1/d_{tk})/\sum_{l=1}^{l=K}(1/d_{tl}).

Author(s)

Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard

References

Boumaza, R. (2004). Discriminant analysis with independently repeated multivariate measurements: an L2L^2 approach. Computational Statistics & Data Analysis, 47, 823-843.

Rudrauf, J.M., Boumaza, R. (2001). Contribution à l'étude de l'architecture médiévale: les caractéristiques des pierres à bossage des châteaux forts alsaciens. Centre de Recherches Archéologiques Médiévales de Saverne, 5, 5-38.

Examples

data(castles.dated)
castles.stones <- castles.dated$stones
castles.periods <- castles.dated$periods
castlesfh <- folderh(castles.periods, "castle", castles.stones)
result <- fdiscd.misclass(castlesfh, "period")
print(result)

[Package dad version 4.1.2 Index]