fdiscd.misclass {dad} | R Documentation |
Misclassification ratio in functional discriminant analysis of probability densities.
Description
Computes the one-leave-out misclassification ratio of the rule assigning T
groups of individuals, one group after another, to the class of groups (among K
classes of groups) which achieves the minimum of the distances or divergences between the density function associated to the group to assign and the K
density functions associated to the K
classes.
Usage
fdiscd.misclass(xf, class.var, gaussiand = TRUE,
distance = c("jeffreys", "hellinger", "wasserstein", "l2", "l2norm"),
crit = 1, windowh = NULL)
Arguments
xf |
object of class
|
class.var |
string. The name of the class variable. |
distance |
The distance or dissimilarity used to compute the distance matrix between the densities. It can be:
If |
crit |
1, 2 or 3. In order to select the densities associated to the classes. See Details. If |
gaussiand |
logical. If If |
windowh |
strictly positive numeric value. If Omitted when |
Details
The T
probability densities f_t
corresponding to the T
groups of individuals are either parametrically estimated (gaussiand = TRUE
) or estimated using the Gaussian kernel method (gaussiand = FALSE
). In the latter case, the windowh
argument provides the list of the bandwidths to be used. Notice that in the multivariate case (p
>1), the bandwidths are positive-definite matrices.
The argument windowh
is a numerical value, the matrix bandwidth is of the form h S
, where S
is either the square root of the covariance matrix (p
>1) or the standard deviation of the estimated density.
If windowh = NULL
(default), h
in the above formula is computed using the bandwidth.parameter
function.
To the class k
consisting of T_k
groups is associated the density denoted g_k
. The crit
argument selects the estimation method of the K
densities g_k
.
-
The density
g_k
is estimated using the whole data of this class, that is the rows ofx
corresponding to theT_k
groups of the classk
.The estimation of the densities
g_k
uses the same method as the estimation of thef_t
. -
The
T_k
densitiesf_t
are estimated using the corresponding data fromx
. Then they are averaged to obtain an estimation of the densityg_k
, that isg_k = \frac{1}{T_k} \, \sum{f_t}
. -
Each previous density
f_t
is weighted byn_t
(the number of rows ofx
corresponding tof_t
). Then they are averaged, that isg_k = \frac{1}{\sum n_t} \sum n_t f_t
.
The last two methods are only available for the L^2
-distance. If the divergences between densities are computed using the Hellinger or Wasserstein distance or Jeffreys measure, only the first of these methods is available.
The distance or dissimilarity between the estimated densities is either the L^2
distance, the Hellinger distance, Jeffreys measure (symmetrised Kullback-Leibler divergence) or the Wasserstein distance.
If it is the
L^2
distance (distance="l2"
ordistance="l2norm"
), the densities can be either parametrically estimated or estimated using the Gaussian kernel.If it is the Hellinger distance (
distance="hellinger"
), Jeffreys measure (distance="jeffreys"
) or the Wasserstein distance (distance="wasserstein"
), the densities are considered Gaussian and necessarily parametrically estimated.
Value
Returns an object of class fdiscd.misclass
, that is a list including:
classification |
data frame with 4 columns:
|
confusion.mat |
confusion matrix, |
misalloc.per.class |
the misclassification ratio per class, |
misclassed |
the misclassification ratio, |
distances |
matrix with |
proximities |
matrix of the proximity indices (in percents) between the groups and the classes. The proximity of the group |
Author(s)
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard
References
Boumaza, R. (2004). Discriminant analysis with independently repeated multivariate measurements: an L^2
approach. Computational Statistics & Data Analysis, 47, 823-843.
Rudrauf, J.M., Boumaza, R. (2001). Contribution à l'étude de l'architecture médiévale: les caractéristiques des pierres à bossage des châteaux forts alsaciens. Centre de Recherches Archéologiques Médiévales de Saverne, 5, 5-38.
Examples
data(castles.dated)
castles.stones <- castles.dated$stones
castles.periods <- castles.dated$periods
castlesfh <- folderh(castles.periods, "castle", castles.stones)
result <- fdiscd.misclass(castlesfh, "period")
print(result)