optim_tdv_hill_climb {diffval} | R Documentation |
Total Differential Value optimization using Hill-climbing algorithms
Description
This function searches for partitions of the columns of a given matrix, optimizing the Total Differential Value (TDV).
Usage
optim_tdv_hill_climb(
m_bin,
k,
p_initial = "random",
n_runs = 1,
n_sol = 1,
maxit = 10,
min_g_size = 1,
stoch_first = FALSE,
stoch_neigh_size = 1,
stoch_maxit = 100,
full_output = FALSE,
verbose = FALSE
)
Arguments
m_bin |
A matrix. A phytosociological table of 0s (absences) and 1s (presences), where rows correspond to taxa and columns correspond to relevés. |
k |
A numeric giving the number of desired groups. |
p_initial |
A vector or a character. A vector of integer numbers
with the initial partition of the relevés (i.e., a vector with values from
1 to |
n_runs |
A numeric giving the number of runs to perform. |
n_sol |
A numeric giving the number of best solutions to keep in the final output. Defaults to 1. |
maxit |
A numeric giving the number of iterations of the Hill-climbing optimization. |
min_g_size |
A numeric. The minimum number of relevés that a group can contain (must be 1 or higher). |
stoch_first |
A logical. |
stoch_neigh_size |
A numeric giving the size (n) of the
n-neighbours for the Stochastic Hill-climbing; only used if
|
stoch_maxit |
A numeric giving the number of iterations of the
Stochastic Hill-climbing optimization; only used if |
full_output |
A logical. If |
verbose |
A logical. If |
Details
Given a phytosociological table (m_bin
, rows corresponding to
taxa and columns corresponding to relevés) this function searches for
a k
-partition (k
defined by the user) optimizing TDV, i.e., searches,
using a Hill-climbing algorithm, for patterns of differential taxa by
rearranging the relevés into k
groups.
Optimization can start from a random partition (p_ini = "random"
), or
from a given partition (p_ini
, defined by the user or produced by any
clustering method, or even a manual classification of the relevés).
Each iteration searches for a TDV improvement screening all 1-neighbours,
until the given number of maximum iterations (maxit
) is reached. A
1-neighbour of a given partition is another partition obtained by changing
1 relevé (of the original partition) to a different group. A n-neighbour
is obtained, equivalently, ascribing n relevés to different groups.
Optionally, a faster search (Stochastic Hill-climbing) can be performed in
a first step (stoch_first = TRUE
), consisting on searching for TDV
improvements, by randomly selecting, in each iteration, one n-neighbour (n
defined by the user in the parameter stoch_neigh_size
), accepting that
n-neighbour partition as a better solution if it improves TDV. This is
repeated until a given number of maximum iterations (stoch_maxit
) is
reached. Stochastic Hill-climbing might be helpful for big tables (where
the screening of all 1-neighbours might be too time consuming).
Several runs of this function (i.e., multiple starts) should be tried out, as several local maxima are usually present and the Hill-climbing algorithm converges easily to local maxima.
Trimming your table by a 'constancy' range or using the result of other
cluster methodologies as input, might help finding interesting partitions.
Specially after trimming the table by a 'constancy' range, getting a random
initial partition with TDV greater than zero might be unlikely; on such
cases using a initial partition from partition_tdv_grasp()
or
partition_tdv_grdtp()
(or even the result of other clustering
strategies) as an input partition might be useful.
Value
If full_output = FALSE
, a list with (at most) n_sol
best
solutions (equivalent solutions are removed). Each best solution is also
a list with the following components:
- local_maximum
A logical indicating if
par
is a 1-neighbour local maximum.- par
A vector with the partition of highest TDV obtained by the Hill-climbing algorithm(s).
- tdv
A numeric with the TDV of
par
.
If full_output = TRUE
, a list with just one component (one run only),
containing also a list with the following components:
- res.stoch
A matrix with the iteration number (of the Stochastic Hill-climbing phase), the maximum TDV found until that iteration, and the TDV of the randomly selected n-neighbour in that iteration.
- par.stoch
A vector with the best partition found in the Stochastic Hill-climbing phase.
- tdv.stoch
A numeric showing the maximum TDV found in the Stochastic Hill-climbing phase (if selected).
- res
A matrix with the iteration number (of the Hill-climbing), the maximum TDV found until that iteration, and the highest TDV among all 1-neighbours.
- local_maximum
A logical indicating if
par
is a 1-neighbour local maximum.- par
A vector with the partition of highest TDV obtained by the Hill-climbing algorithm(s).
- tdv
A numeric with the TDV of
par
.
Author(s)
Tiago Monteiro-Henriques. E-mail: tmh.dev@icloud.com.
Examples
# Getting the Taxus baccata forests data set
data(taxus_bin)
# Removing taxa occurring in only one relevé in order to
# reproduce the example in the original article of the data set
taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ]
# Obtaining a partition that maximizes TDV using the Stochastic Hill-climbing
# and the Hill-climbing algorithms
result <- optim_tdv_hill_climb(
m_bin = taxus_bin_wmt,
k = 3,
n_runs = 7,
n_sol = 2,
min_g_size = 3,
stoch_first = TRUE,
stoch_maxit = 500,
verbose = TRUE
)
# Inspect the result. The highest TDV found in the runs.
result[[1]]$tdv
# If result[[1]]$tdv is 0.1958471 you are probably reproducing the three
# groups (Estrela, Gerês and Galicia) from the original article. If not
# try again the optim_tdv_hill_climb function (maybe increasing n_runs).
# Plot the sorted (or tabulated) phytosociological table
tabul1 <- tabulation(
m_bin = taxus_bin_wmt,
p = result[[1]]$par,
taxa_names = rownames(taxus_bin_wmt),
plot_im = "normal"
)
# Plot the sorted (or tabulated) phytosociological table, also including
# taxa occurring just once in the matrix
tabul2 <- tabulation(
m_bin = taxus_bin,
p = result[[1]]$par,
taxa_names = rownames(taxus_bin),
plot_im = "normal"
)