R: Multi-step hexagonal som training

som.nn.multitrain {som.nn}

R Documentation

Multi-step hexagonal som training

Description

A self-organising map with hexagonal tolology is trained in several steps and a model of Type SOMnn created for prediction of unknown samples. In contrast to a "normal" som, class-labels for all samples of the training set are required to build the topological model after SOM training.

Usage

som.nn.multitrain(
  x,
  class.col = 1,
  kernel = "internal",
  xdim = 7,
  ydim = 5,
  toroidal = FALSE,
  len = c(0),
  alpha = c(0.2),
  radius = c(0),
  focus = 1,
  norm = TRUE,
  dist.fun = dist.fun.inverse,
  max.dist = 1.1,
  name = "som.nn job"
)

Arguments

`x`	data.fame with training data. Samples are requested as rows and taken randomly for the training steps. All columns except of the class lables are considered to be attributes and parts of the training vector. One column is needed as class labels. The column with class lables is selected by the argument `class.col`.
`class.col`	single string or number. If class is a string, it is considered to be the name of the column with class labels. If class is a number, the respective column will be used as class labels (after beeing coerced to character). Default is 1.
`kernel`	kernel for som training. One of the predefined kernels `"bubble"`: train with the R-implementation or `"gaussian"`: train with the R-implementation of the Gaussian kernel or `"SOM"`: train with `SOM` (`class::SOM`) or `"kohonen"`: train with `som` (`kohonen::som`) or `"som"`: train with `som` (`som::som`). If a function is specified (as closure, not as character) the specified custom function is used for training.
`xdim`	dimension in x-direction.
`ydim`	dimension in y-direction.
`toroidal`	`logical`; if TRUE an endless som is trained as on the surface of a torus. default: FALSE.
`len`	`vector` of numberis of steps to be trained (steps - not epochs!). the length of len defines the number of training rounds tobe performed.
`alpha`	initial training rate; the learning rate is decreased linearly to 0.0 for the laset training step. Default: 0.02. If length(`alpha`) > 1, the length must be tha same as for `len` and defines different alphas for each training round.
`radius`	inital radius for SOM training. If Gaussian distance function is used, radius corresponds to sigma. The distance is decreased linearly to 1.0 for the last training step. If `radius = 0` (default), the diameter of the SOM is used as initial radius. If length(`radius`) > 1, the length must be tha same as for `len` and defines different radii for each training round.
`focus`	Enhancement factor for focussing of training of "dirty" samples.
`norm`	logical; if TRUE, input data is normalised by `scale(x, TRUE, TRUE)`.
`dist.fun`	parameter for k-NN prediction: Function used to calculate distance-dependent weights. Any distance function must accept the two parameters `x` (distance) and `sigma` (maximum distance to give a weight > 0.0). Default is `dist.fun.inverse`.
`max.dist`	parameter for k-NN prediction: Parameter `sigma` for dist.fun. Default is 2.1. In order to avoid rounding issues, it is recommended not to use exact integers as limit, but values like 1.1 to make sure, that all neurons within distance 1 are included.
`name`	optional name for the model. Name will be stored as slot `model@name` in the trained model.

Details

Besides of the predefined kernels "bubble", "gaussian", "SOM", "kohonen" or "som", any specified custom kernel function can be used for som training. The function must match the signature kernel(data, grid, rlen, alpha, radius, init, toroidal), with arguments:

data: numeric matrix of training data; one sample per row
classes: optional charater vector of classes for training data
grid: somgrid, generated with somgrid
rlen: number of training steps
alpha: training rate
radius: training radius
init: numeric matrix of initial codebook vectors; one code per row
toroidal: logical; TRUE, if the topology of grid is toroidal

The returned value must be a list with at minimum one element

codes: numeric matrix of result codebook vectors; one code per row

If focus > 1 enhancement of dirty samples is activated: Training samples, mapped to neuron with >1 classes, are preferred in the next training step.

Value

    S4 object of type \code{\link{SOMnn}} with the trained model

Examples

## get example data and add class labels:
data(iris)
species <- iris$Species

## train with default radius = diagonal / 2:
rlen <- 500
som <- som.nn.train(iris, class.col = "Species", kernel = "internal",
                    xdim = 15, ydim = 9, alpha = 0.2, len = rlen, 
                    norm = TRUE, toroidal = FALSE)


## continue training with different alpha and radius;
som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 5)
som <- som.nn.continue(som, iris, alpha = 0.02, len=500, radius = 2)

## predict some samples:
unk <- iris[,!(names(iris) %in% "Species")]

setosa <- unk[species=="setosa",]
setosa <- setosa[sample(nrow(setosa), 20),]

versicolor <- unk[species=="versicolor",]
versicolor <- versicolor[sample(nrow(versicolor), 20),]

virginica <- unk[species=="virginica",]
virginica <- virginica[sample(nrow(virginica), 20),]

p <- predict(som, unk)
head(p)

## plot:
plot(som)
dev.off()
plot(som, predict = predict(som, setosa))
plot(som, predict = predict(som, versicolor), add = TRUE, pch.col = "magenta", pch = 17)
plot(som, predict = predict(som, virginica), add = TRUE, pch.col = "white", pch = 8)

[Package som.nn version 1.4.4 Index]