R: Total Differential Value optimization using Hill-climbing...

optim_tdv_hill_climb {diffval}

R Documentation

Total Differential Value optimization using Hill-climbing algorithms

Description

This function searches for partitions of the columns of a given matrix, optimizing the Total Differential Value (TDV).

Usage

optim_tdv_hill_climb(
  m_bin,
  k,
  p_initial = "random",
  n_runs = 1,
  n_sol = 1,
  maxit = 10,
  min_g_size = 1,
  stoch_first = FALSE,
  stoch_neigh_size = 1,
  stoch_maxit = 100,
  full_output = FALSE,
  verbose = FALSE
)

Arguments

`m_bin`	A matrix. A phytosociological table of 0s (absences) and 1s (presences), where rows correspond to taxa and columns correspond to relevés.
`k`	A numeric giving the number of desired groups.
`p_initial`	A vector or a character. A vector of integer numbers with the initial partition of the relevés (i.e., a vector with values from 1 to `k`, with length equal to the number of columns of `m_bin`, ascribing each relevé to one of the `k` groups). By default, `p_initial = "random"`, generates a random initial partition.
`n_runs`	A numeric giving the number of runs to perform.
`n_sol`	A numeric giving the number of best solutions to keep in the final output. Defaults to 1.
`maxit`	A numeric giving the number of iterations of the Hill-climbing optimization.
`min_g_size`	A numeric. The minimum number of relevés that a group can contain (must be 1 or higher).
`stoch_first`	A logical. `FALSE` (the default), performs only Hill-climbing on the 1-neighbours; `TRUE` first, performs a Stochastic Hill-climbing on n-neighbours (n is defined by the parameter `stoch_neigh_size`), and only after runs the Hill-climbing search on the 1-neighbours; see description above.
`stoch_neigh_size`	A numeric giving the size (n) of the n-neighbours for the Stochastic Hill-climbing; only used if `stoch_first = TRUE`. Defaults to 1.
`stoch_maxit`	A numeric giving the number of iterations of the Stochastic Hill-climbing optimization; only used if `stoch_first = TRUE`. Defaults to 100.
`full_output`	A logical. If `FALSE` (the default) the best `n_sol` partitions and respective indices are returned. If `TRUE` (only available for `n_sol = 1`) the output will also contain information on the optimization steps (see below).
`verbose`	A logical. If `FALSE` nothing is printed during the runs. If `TRUE`, after each run, the run number is printed as well as and indication if the found partition is a 1-neighbour local maximum.

Details

Given a phytosociological table (m_bin, rows corresponding to taxa and columns corresponding to relevés) this function searches for a k-partition (k defined by the user) optimizing TDV, i.e., searches, using a Hill-climbing algorithm, for patterns of differential taxa by rearranging the relevés into k groups.

Optimization can start from a random partition (p_ini = "random"), or from a given partition (p_ini, defined by the user or produced by any clustering method, or even a manual classification of the relevés).

Each iteration searches for a TDV improvement screening all 1-neighbours, until the given number of maximum iterations (maxit) is reached. A 1-neighbour of a given partition is another partition obtained by changing 1 relevé (of the original partition) to a different group. A n-neighbour is obtained, equivalently, ascribing n relevés to different groups.

Optionally, a faster search (Stochastic Hill-climbing) can be performed in a first step (stoch_first = TRUE), consisting on searching for TDV improvements, by randomly selecting, in each iteration, one n-neighbour (n defined by the user in the parameter stoch_neigh_size), accepting that n-neighbour partition as a better solution if it improves TDV. This is repeated until a given number of maximum iterations (stoch_maxit) is reached. Stochastic Hill-climbing might be helpful for big tables (where the screening of all 1-neighbours might be too time consuming).

Several runs of this function (i.e., multiple starts) should be tried out, as several local maxima are usually present and the Hill-climbing algorithm converges easily to local maxima.

Trimming your table by a 'constancy' range or using the result of other cluster methodologies as input, might help finding interesting partitions. Specially after trimming the table by a 'constancy' range, getting a random initial partition with TDV greater than zero might be unlikely; on such cases using a initial partition from partition_tdv_grasp() or partition_tdv_grdtp() (or even the result of other clustering strategies) as an input partition might be useful.

Value

If full_output = FALSE, a list with (at most) n_sol best solutions (equivalent solutions are removed). Each best solution is also a list with the following components:

local_maximum: A logical indicating if par is a 1-neighbour local maximum.
par: A vector with the partition of highest TDV obtained by the Hill-climbing algorithm(s).
tdv: A numeric with the TDV of par.

If full_output = TRUE, a list with just one component (one run only), containing also a list with the following components:

res.stoch: A matrix with the iteration number (of the Stochastic Hill-climbing phase), the maximum TDV found until that iteration, and the TDV of the randomly selected n-neighbour in that iteration.
par.stoch: A vector with the best partition found in the Stochastic Hill-climbing phase.
tdv.stoch: A numeric showing the maximum TDV found in the Stochastic Hill-climbing phase (if selected).
res: A matrix with the iteration number (of the Hill-climbing), the maximum TDV found until that iteration, and the highest TDV among all 1-neighbours.
local_maximum: A logical indicating if par is a 1-neighbour local maximum.
par: A vector with the partition of highest TDV obtained by the Hill-climbing algorithm(s).
tdv: A numeric with the TDV of par.

Author(s)

Tiago Monteiro-Henriques. E-mail: tmh.dev@icloud.com.

Examples

# Getting the Taxus baccata forests data set
data(taxus_bin)

# Removing taxa occurring in only one relevé in order to
# reproduce the example in the original article of the data set
taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ]

# Obtaining a partition that maximizes TDV using the Stochastic Hill-climbing
# and the Hill-climbing algorithms

result <- optim_tdv_hill_climb(
  m_bin = taxus_bin_wmt,
  k = 3,
  n_runs = 7,
  n_sol = 2,
  min_g_size = 3,
  stoch_first = TRUE,
  stoch_maxit = 500,
  verbose = TRUE
)

# Inspect the result. The highest TDV found in the runs.
result[[1]]$tdv
# If result[[1]]$tdv is 0.1958471 you are probably reproducing the three
# groups (Estrela, Gerês and Galicia) from the original article. If not
# try again the optim_tdv_hill_climb function (maybe increasing n_runs).

# Plot the sorted (or tabulated) phytosociological table
tabul1 <- tabulation(
  m_bin = taxus_bin_wmt,
  p = result[[1]]$par,
  taxa_names = rownames(taxus_bin_wmt),
  plot_im = "normal"
)

# Plot the sorted (or tabulated) phytosociological table, also including
# taxa occurring just once in the matrix
tabul2 <- tabulation(
  m_bin = taxus_bin,
  p = result[[1]]$par,
  taxa_names = rownames(taxus_bin),
  plot_im = "normal"
)

[Package diffval version 1.1.0 Index]