R: Evaluation of the candidate clustering partition in...

checkBranchLocalIMO {Anthropometry}

R Documentation

Evaluation of the candidate clustering partition in $HIPAM_IMO$

Description

In the HIPAM algorithm, each (parent) cluster P is investigated to see if it can be divided further into new (child) clusters, or stop (in this case, P would be a terminal node).

In this version of HIPAM, called $HIPAM_IMO$, there are three different stopping criteria: First, if $|P| leq 2$, then P is a terminal node. If not, the second stopping refers to the INCA (Index Number Clusters Atypical) criterion (Irigoien et al. (2008)): if $INCA_k leq 0.2$ for all k, then P is a terminal node. Finally, the third stopping criteria uses the Mean Split Silhouette. See Vinue et al. (2014) for more details.

The foundation and performance of the HIPAM algorithm is explained in hipamAnthropom.

Usage

checkBranchLocalIMO(tree,data,i,maxsplit,asw.tol,local.const,orness,type,ah,
                    verbose,...)

Arguments

`tree`	The clustering tree being defined.
`data`	Data to be clustered.
`i`	A specific cluster of the clustering partition in a certain level of the tree.
`maxsplit`	The maximum number of clusters that any cluster can be divided when searching for the best clustering.
`asw.tol`	If this value is given, a tolerance or penalty can be introduced (asw.tol > 0 or asw.tol < 0, respectively) in the branch splitting procedure. Default value (0) is maintained. See page 154 of Wit et al. (2004) for more details.
`local.const`	If this value is given (meaningful values are those between -1 and 1), a proposed partition is accepted only if the associated asw is greater than this constant. Default option for this argument is maintained, that is to say, this value is ignored. See page 154 of Wit et al. (2004) for more details.
`orness`	Quantity to measure the degree to which the aggregation is like a min or max operation. See `weightsMixtureUB` and `getDistMatrix`.
`type`	Option 'IMO' for using $HIPAM_IMO$.
`ah`	Constants that define the `ah` slopes of the distance function in `getDistMatrix`. Given the five variables considered, this vector is c(23,28,20,25,25). This vector would be different according to the variables considered.
`verbose`	Boolean variable (TRUE or FALSE) to indicate whether to report information on progress.
`...`	Other arguments that may be supplied.

Value

The new resulting classification tree.

Note

This function belongs to the $HIPAM_IMO$ algorithm and it is not solely used. That is why there is no section of examples in this help page. See hipamAnthropom.

Author(s)

This function was originally created by E. Wit et al., and it is available freely on https://www.math.rug.nl/~ernst/book/smida.html. We have adapted it to incorporate the second stopping criterion related to INCA.

References

Vinue, G., Leon, T., Alemany, S., and Ayala, G., (2014). Looking for representative fit models for apparel sizing, Decision Support Systems 57, 22–33.

Wit, E., and McClure, J., (2004). Statistics for Microarrays: Design, Analysis and Inference. John Wiley & Sons, Ltd.

Wit, E., and McClure, J., (2006). Statistics for Microarrays: Inference, Design and Analysis. R package version 0.1. https://www.math.rug.nl/~ernst/book/smida.html.

Pollard, K. S., and van der Laan, M. J., (2002). A method to identify significant clusters in gene expression data. Vol. II of SCI2002 Proceedings, 318–325.

Irigoien, I., and Arenas, C., (2008). INCA: New statistic for estimating the number of clusters and identifying atypical units, Statistics in Medicine 27, 2948–2973.

Irigoien, I., Sierra, B., and Arenas, C., (2012). ICGE: an R package for detecting relevant clusters and atypical units in gene expression, BMC Bioinformatics 13 1–29.