R: HIPAM algorithm for anthropometric data

hipamAnthropom {Anthropometry}

R Documentation

HIPAM algorithm for anthropometric data

Description

The HIerarchical Partitioning Around Medoids clustering method (HIPAM) was originally created to gene clustering (Wit et al. (2004)). The HIPAM algorithm is a divisive hierarchical clustering method based on the PAM algorithm.

This function is a HIPAM algorithm adapted to deal with anthropometric data. To that end, a different dissimilarity function is incorporated. This function is that explained in McCulloch et al. (1998) and it is implemented in getDistMatrix. We call it $d_MO$. In addition, a different method to obtain a classification tree is also incorporated.

Two HIPAM algorithms are proposed. The first one, called $HIPAM_MO$, is a HIPAM that uses $d_MO$. The second one, $HIPAM_IMO$, is a HIPAM algorithm that uses $d_MO$ and the INCA (Index Number Clusters Atypical) statistic criterion (Irigoien et al. (2008)) to decide the number of child clusters and as a stopping rule.

See Vinue et al. (2014) for more details.

Usage

hipamAnthropom(data,asw.tol=0,maxsplit=5,local.const=NULL,
               orness=0.7,type,ah=c(23,28,20,25,25),verbose,...)

Arguments

`data`	Data frame. In our approach, this is each of the subframes originated after segmenting the whole anthropometric Spanish survey into twelve bust segments, according to the European standard on sizing systems. Size designation of clothes. Part 3: Measurements and intervals. Each row corresponds to an observation, and each column corresponds to a variable. All variables are numeric.
`asw.tol`	If this value is given, a tolerance or penalty can be introduced (asw.tol > 0 or asw.tol < 0, respectively) in the branch splitting procedure. Default value (0) is maintained. See page 154 of Wit et al. (2004) for more details.
`maxsplit`	The maximum number of clusters that any cluster can be divided into when searching for the best clustering.
`local.const`	If this value is given (meaningful values are those between -1 and 1), a proposed partition is accepted only if the associated asw is greater than this constant. Default option for this argument is maintained, that is to say, this value is ignored. See page 154 of Wit et al. (2004) for more details.
`orness`	Quantity to measure the degree to which the aggregation is like a min or max operation. See `weightsMixtureUB` and `getDistMatrix`.
`type`	Type of HIPAM algorithm to be used. The possible options are 'MO' (for $HIPAM_MO$) and 'IMO' (for $HIPAM_IMO$).
`ah`	Constants that define the `ah` slopes of the distance function in `getDistMatrix`. Given the five variables considered, this vector is c(23,28,20,25,25). This vector would be different according to the variables considered.
`verbose`	Boolean variable (TRUE or FALSE) to indicate whether to report information on progress.
`...`	Other arguments that may be supplied to the internal functions of the HIPAM algorithms.

Details

The $HIPAM_MO$ algorithm uses the getBestPamsamMO and checkBranchLocalMO functions, while the $HIPAM_IMO$ algorithm uses the getBestPamsamIMO and checkBranchLocalIMO functions.

For more details of HIPAM, see van der Laan et al. (2003), Wit et al. (2004) and the manual of the smida R package.

Value

A list with the following elements:

clustering: Final clustering that corresponds to the last level of the tree.

asw: The asw of the final clustering.

n.levels: Number of levels in the tree.

cases: Anthropometric cases (medoids of all of the clusters in the tree).

active: Activity status of each cluster (FALSE for every cluster of the final partition).

development: Matrix that indicates the ancestors of the final clusters.

num.of.clusters: Number of clusters in the final clustering.

metric: Dissimilarity used (called 'McCulloch' because the dissimilarity function used is that explained in McCulloch et al. (1998)).

Note

All the functions related to the HIPAM algorithm were originally created by E. Wit et al., and they are available freely on https://www.math.rug.nl/~ernst/book/smida.html. In order to develop the $HIPAM_MO$ and $HIPAM_IMO$ algorithms, we have used and adapted them.

Author(s)

Guillermo Vinue

References

Vinue, G., Leon, T., Alemany, S., and Ayala, G., (2014). Looking for representative fit models for apparel sizing, Decision Support Systems 57, 22–33.

Wit, E., and McClure, J., (2004). Statistics for Microarrays: Design, Analysis and Inference. John Wiley & Sons, Ltd.

Wit, E., and McClure, J., (2006). Statistics for Microarrays: Inference, Design and Analysis. R package version 0.1. https://www.math.rug.nl/~ernst/book/smida.html.

van der Laan, M. J., and Pollard, K. S., (2003). A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap, Journal of Statistical Planning and Inference 117, 275–303.

Pollard, K. S., and van der Laan, M. J., (2002). A method to identify significant clusters in gene expression data. Vol. II of SCI2002 Proceedings, 318–325.

Irigoien, I., and Arenas, C., (2008). INCA: New statistic for estimating the number of clusters and identifying atypical units, Statistics in Medicine 27, 2948–2973.

Irigoien, I., Sierra, B., and Arenas, C., (2012). ICGE: an R package for detecting relevant clusters and atypical units in gene expression, BMC Bioinformatics 13, 1–29.

McCulloch, C., Paal, B., and Ashdown, S., (1998). An optimization approach to apparel sizing, Journal of the Operational Research Society 49, 492–499.

European Committee for Standardization. Size designation of clothes. Part 3: Measurements and intervals. (2005).

Alemany, S., Gonzalez, J. C., Nacher, B., Soriano, C., Arnaiz, C., and Heras, H., (2010). Anthropometric survey of the Spanish female population aimed at the apparel industry. Proceedings of the 2010 Intl. Conference on 3D Body scanning Technologies, 307–315.

Examples

#FOR THE SIZES DEFINED BY THE EUROPEAN NORMATIVE:
dataHipam <- sampleSpanishSurvey
bust <- dataHipam$bust
bustSizes <- bustSizesStandard(seq(74, 102, 4), seq(107, 131, 6))

type <- "IMO"
maxsplit <- 5 ; orness <- 0.7
ah <- c(23, 28, 20, 25, 25)

#For reproducing results, seed for randomness:
#suppressWarnings(RNGversion("3.5.0"))
#set.seed(2013)
numSizes <- 1
res_hipam <- computSizesHipamAnthropom(dataHipam, bust, bustSizes$bustCirc, numSizes,
                                       maxsplit, orness, type, ah, FALSE)

fitmodels <- anthrCases(res_hipam, numSizes)
outliers <- trimmOutl(res_hipam, numSizes)

#FOR ANY OTHER DEFINED SIZE:
#For reproducing results, seed for randomness:
#suppressWarnings(RNGversion("3.5.0"))
#set.seed(1900)
rand <- sample(1:600,20)
dataComp <- sampleSpanishSurvey[rand, c(2, 3, 5)]
numVar <- dim(dataComp)[2]

type <- "IMO"
maxsplit <- 5 ; orness <- 0.7
ah <- c(28, 25, 25) 

dataMat <- as.matrix(dataComp)
#For reproducing results, seed for randomness:
#suppressWarnings(RNGversion("3.5.0"))
#set.seed(2013)
res_hipam_One <- list() ; class(res_hipam_One) <- "hipamAnthropom" 
res_hipam_One[[1]] <- hipamAnthropom(dataMat, maxsplit = maxsplit, orness = orness, 
                                     type = type, ah = ah, verbose = FALSE)
                            
#plotTreeHipamAnthropom(res_hipam_One, main="Proposed Hierarchical PAM Clustering \n")

fitmodels_One <- anthrCases(res_hipam_One,1)
outliers_One <- trimmOutl(res_hipam_One,1)

[Package Anthropometry version 1.19 Index]