WH_2d_Adaptive_Kohonen_maps {HistDAWass}R Documentation

Batch Kohonen self-organizing 2d maps using adaptive distances for histogram-valued data

Description

The function implements a Batch Kohonen self-organizing 2d maps algorithm for histogram-valued data.

Usage

WH_2d_Adaptive_Kohonen_maps(
  x,
  net = list(xdim = 4, ydim = 3, topo = c("rectangular")),
  kern.param = 2,
  TMAX = -9999,
  Tmin = -9999,
  niter = 30,
  repetitions,
  simplify = FALSE,
  qua = 10,
  standardize = FALSE,
  schema = 6,
  init.weights = "EQUAL",
  weight.sys = "PROD",
  theta = 2,
  Wfix = FALSE,
  verbose = FALSE,
  atleast = 2
)

Arguments

x

A MatH object (a matrix of distributionH).

net

a list describing the topology of the net list(xdim=number of rows, ydim=numbers of columns,topo=c('rectangular' or 'hexagonal')), see somgrid sintax in packageclass default net=list(xdim=4,ydim=3,topo=c('rectangular'))

kern.param

(default =2) the kernel parameter for the RBF kernel used in the algorithm

TMAX

a parameter useful for the iterations (default=2)

Tmin

a parameter useful for the iterations (default=0.2)

niter

maximum number of iterations (default=30)

repetitions

number of repetion of the algorithm (default=5), beacuase each launch may generate a local optimum

simplify

a logical parameter for speeding up computations (default=FALSE). If true data are recoded in order to have fast computations

qua

if simplify=TRUE number of equally spaced quantiles for recodify the histograms (default=10)

standardize

A logic value (default is FALSE). If TRUE, histogram-valued data are standardized, variable by variable, using the Wassertein based standard deviation. Use if one wants to have variables with std equal to one.

schema

a number from 1 to 4
1=A weight for each variable (default)
2=A weight for the average and the dispersion component of each variable
3=Same as 1 but a different set of weights for each cluster
4=Same as 2 but a different set of weights for each cluster

init.weights

a string how to initialize weights: 'EQUAL' (default), all weights are the same,

weight.sys

a string. Weights may add to one ('SUM') or their product is equal to 1 ('PROD', default).

theta

a number. A parameter if weight.sys='SUM', default is 2.

Wfix

a logical parameter (default=FALSE). If TRUE the algorithm does not use adaptive distances.

verbose

a logical parameter (default=FALSE). If TRUE details of computation are shown during the execution. #'

atleast

integer. Check for degeneration of the map into a very low number of voronoi sets. (default 2) 2 means that the map will have at least 2 neurons attracting data instances in their voronoi sets.

Details

An extension of Batch Self Organised Map (BSOM) is here proposed for histogram data. These kind of data have been defined in the context of symbolic data analysis. The BSOM cost function is then based on a distance function: the L2 Wasserstein distance. This distance has been widely proposed in several techniques of analysis (clustering, regression) when input data are expressed by distributions (empirical by histograms or theoretical by probability distributions). The peculiarity of such distance is to be an Euclidean distance between quantile functions so that all the properties proved for L2 distances are verified again. An adaptative versions of BSOM is also introduced considering an automatic system of weights in the cost function in order to take into account the different effect of the several variables in the Self-Organised Map grid.

Value

a list with the results of the Batch Kohonen map

Slots

solution

A list.Returns the best solution among the repetitionsetitions, i.e. the one having the minimum sum of squares criterion.

solution$MAP

The map topology.

solution$IDX

A vector. The clusters at which the objects are assigned.

solution$cardinality

A vector. The cardinality of each final cluster.

solution$proto

A MatH object with the description of centers.

solution$Crit

A number. The criterion (Sum od square deviation from the centers) value at the end of the run.

solution$Weights.comp

the final weights assigned to each component of the histogram variables

solution$Weight.sys

a string the type of weighting system ('SUM' or 'PRODUCT')

quality

A number. The percentage of Sum of square deviation explained by the model. (The higher the better)

References

Irpino A, Verde R, De Carvalho FAT (2012). Batch self organizing maps for interval and histogram data. In: Proceedings of COMPSTAT 2012. p. 143-154, ISI/IASC, ISBN: 978-90-73592-32-2

Examples

## Not run: 
results <- WH_2d_Adaptive_Kohonen_maps(
  x = BLOOD,
  net = list(xdim = 2, ydim = 3, topo = c("rectangular")),
  repetitions = 2, simplify = TRUE,
  qua = 10, standardize = TRUE
)

## End(Not run)

[Package HistDAWass version 1.0.8 Index]