discdd.misclass {dad}R Documentation

Misclassification ratio in functional discriminant analysis of discrete probability distributions.

Description

Computes the one-leave-out misclassification ratio of the rule assigning TT groups of individuals, one group after another, to the class of groups (among KK classes of groups) which achieves the minimum of the distances or divergences between the probability distribution associated to the group to assign and the KK probability distributions associated to the KK classes.

Usage

discdd.misclass(xf, class.var, distance =  c("l1", "l2", "chisqsym", "hellinger",
           "jeffreys", "jensen", "lp"), crit = 1, p)

Arguments

xf

object of class folderh with two data frames or list of arrays (or tables).

  • If it is a folderh:

    • The first data.frame has at least two columns. One column contains the names of the TT groups (all the names must be different). An other column is a factor with KK levels partitionning the T groups into K classes.

    • The second one has (q+1)(q+1) columns. The first qq columns are factors (otherwise, they are coerced into factors). The last column is a factor with TT levels defining TT groups. Each group, say tt, consists of ntn_t individuals.

  • If it is a list of arrays or tables, the ttht^{th} element (t=1,,Tt = 1, \ldots, T) is the table of the joint distribution (absolute or relative frequencies) of the ttht^{th} group. These arrays have the same shape:

    Each array (or table) xf[[i]] has:

    • the same dimension(s). If q=1q = 1 (univariate), dim(xf[[i]]) is an integer. If q>1q > 1 (multivariate), dim(xf[[i]]) is an integer vector of length q.

    • the same dimension names dimnames(xf[[i]]) (is non NULL). These dimnames are the names of the variables.

class.var

string (if xf is an object of class "folderh") or data.frame with two columns (if xf is a list of arrays).

  • If xf is of class "folder", class.var is the name of the class variable.

  • If xf is a list of arrays or a list of tables, class.var is a data.frame with at least two columns named "group" and "class". The "group" column contains the names of the TT groups (all the names must be different). The "class" column is a factor with KK levels partitioning the TT groups into KK classes.

distance

The distance or dissimilarity used to compute the distance matrix between the densities. It can be:

  • "l1" (default) the LpL^p distance with p=1p = 1

  • "l2" the LpL^p distance with p=2p = 2

  • "chisqsym" the symmetric Chi-squared distance

  • "hellinger" the Hellinger metric (Matusita distance)

  • "jeffreys" Jeffreys distance (symmetrised Kullback-Leibler divergence)

  • "jensen" the Jensen-Shannon distance

  • "lp" the LpL^p distance with pp given by the argument p of the function.

crit

1 or 2. In order to select the densities associated to the classes. See Details.

p

integer. Optional. When distance = "lp" (LpL^p distance with p>2p>2), p is the parameter of the distance.

Details

Value

Returns an object of class discdd.misclass, that is a list including:

classification

data frame with 4 columns:

  • factor giving the group name. The column name is the same as that of the column (q+1q+1) of x,

  • the prior class of the group if it is available, or NA if not,

  • alloc: the class allocation computed by the discriminant analysis method,

  • misclassed: boolean. TRUE if the group is misclassed, FALSE if it is well-classed, NA if the prior class of the group is unknown.

confusion.mat

confusion matrix,

misalloc.per.class

the misclassification ratio per class,

misclassed

the misclassification ratio,

distances

matrix with TT rows and KK columns, of the distances (dtkd_{tk}): dtkd_{tk} is the distance between the group tt and the class kk,

proximities

matrix of the proximity indices (in percents) between the groups and the classes. The proximity between the group tt and the class kk is: (1/dtk)/l=1l=K(1/dtl)(1/d_{tk})/\sum_{l=1}^{l=K}(1/d_{tl}).

Author(s)

Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard

References

Rudrauf, J.M., Boumaza, R. (2001). Contribution à l'étude de l'architecture médiévale: les caractéristiques des pierres à bossage des châteaux forts alsaciens, Centre de Recherches Archéologiques médiévales de Saverne, 5, 5-38.

Examples

# Example 1 with a folderh obtained by converting numeric variables
data("castles.dated")
stones <- castles.dated$stones
periods <- castles.dated$periods
stones$height <- cut(stones$height, breaks = c(19, 27, 40, 71), include.lowest = TRUE)
stones$width <- cut(stones$width, breaks = c(24, 45, 62, 144), include.lowest = TRUE)
stones$edging <- cut(stones$edging, breaks = c(0, 3, 4, 8), include.lowest = TRUE)
stones$boss <- cut(stones$boss, breaks = c(0, 6, 9, 20), include.lowest = TRUE )

castlefh <- folderh(periods, "castle", stones)

# Default: dist="l1", crit=1
discdd.misclass(castlefh, "period")

# Hellinger distance, crit=2
discdd.misclass(castlefh, "period", distance = "hellinger", crit = 2)


# Example 2 with a list of 96 arrays
data("dspgd2015")
data("departments")
classes <- departments[, c("coded", "namer")]
names(classes) <- c("group", "class")

# Default: dist="l1", crit=1
discdd.misclass(dspgd2015, classes)

# Hellinger distance, crit=2
discdd.misclass(dspgd2015, classes, distance = "hellinger", crit = 2)

[Package dad version 4.1.2 Index]