calculateFeatureSet {flacco} | R Documentation |
Calculate Landscape Features
Description
Performs an Exploratory Landscape Analysis of a continuous function and computes various features, which quantify the function's landscape. Currently, the following feature sets are provided:
CM: cell mapping features (
"cm_angle"
,"cm_conv"
,"cm_grad"
)ELA: classical ELA features (
"ela_conv"
,"ela_curv"
,"ela_distr"
,"ela_level"
,"ela_local"
,"ela_meta"
)GCM: general cell mapping features (
"gcm"
)BT: barrier tree features (
"bt"
)IC: information content features (
"ic"
)Basic: basic features (
"basic"
)Disp: dispersion features (
"disp"
)LiMo: linear model features (
"limo"
)NBC: nearest better clustering features (
"nbc"
)PC: principal component features (
"pca"
)
Usage
calculateFeatureSet(feat.object, set, control, ...)
calculateFeatures(feat.object, control, ...)
Arguments
feat.object |
[ |
set |
[ |
control |
[ |
... |
[any] |
Details
Note that if you want to speed up the runtime of the features, you might
consider running your feature computation parallelized. For more
information, please refer to the parallelMap
package or to
https://mlr.mlr-org.com/articles/tutorial/parallelization.html.
Furthermore, please consider adapting the feature computation to your
needs. Possible control
arguments are:
general:
show_progress
: Show progress bar when computing the features? The default isTRUE
.subset
: Specify a subset of features that should be computed. Per default, all features will be computed.allow_cellmapping
: Should cell mapping features be computed? The default isTRUE
.allow_costs
: Should expensive features, i.e. features, which require additional function evaluations, be computed? The default isTRUE
if the feature object provides a function, otherwiseFALSE
.blacklist
: Which features should NOT be computed? The default isNULL
, i.e. none of the features will be excluded.
cell mapping angle features:
cm_angle.show_warnings
: Should possible warnings aboutNAs
in the feature computation be shown? The default isFALSE
.
cell mapping convexity features:
cm_conv.diag
: Should cells, which are located on the diagonal compared to the current cell, be considered as neighbouring cells? The default isFALSE
, i.e. only cells along the axes are considered as neighbours.cm_conv.dist_method
: Which distance method should be used for computing the distance between two observations? All methods ofdist
are possible options with"euclidean"
being the default.cm_conv.minkowski_p
: Value ofp
in casedist_meth
is"minkowski"
. The default is2
, i.e. the euclidean distance.cm_conv.fast_k
: Percentage of elements that should be considered within the nearest neighbour computation. The default is0.05
.
cell mapping gradient homogeneity features:
cm_grad.dist_tie_breaker
: How will ties be broken when different observations have the same distance to an observation? Possible values are"sample"
,"first"
and"last"
. The default is"sample"
.cm_grad.dist_method
: Which distance method should be used for computing the distance between two observations? All methods ofdist
are possible options with"euclidean"
being the default.cm_grad.minkowski_p
: Value ofp
in casedist_meth
is"minkowski"
. The default is2
, i.e. the euclidean distance.cm_grad.show_warnings
: Should possible warnings about (almost) empty cells be shown? The default isFALSE
.
ELA convexity features:
ela_conv.nsample
: Number of samples that are drawn for calculating the convexity features. The default is1000
.ela_conv.threshold
: Threshold of the linearity, i.e. the tolerance to / deviation from perfect linearity, in order to still be considered linear. The default is1e-10
.
ELA curvature features:
ela_curv.sample_size
: Number of samples used for calculating the curvature features. The default is100*d
.ela_curv.{delta, eps, zero_tol, r, v}
: Parameters used bygrad
andhessian
within the approximation of the gradient and hessian. The default values are identical to the ones from the corresponding functions. Note that we slightly modifiedhessian
in order to assure that we do not exceed the boundaries during the estimation of the Hessian.
ELA distribution features:
ela_distr.smoothing_bandwidth
: The smoothing bandwidth, which should be used within thedensity
estimation. The default is"SJ"
.ela_distr.modemass_threshold
: Threshold that is used in order to classify whether a minimum can be considered as a peak. The default is0.01
.ela_distr.skewness_type
: Algorithm type for computing theskewness
. The default is3
.ela_distr.kurtosis_type
: Algorithm type for computing thekurtosis
. The default is3
.
ELA levelset features:
ela_level.quantiles
: Cutpoints (quantiles of the objective values) for splitting the objective space. The default isc(0.10, 0.25, 0.50)
.ela_level.classif_methods
: Methods for classifying the artificially splitted objective space. The default isc("lda", "qda", "mda")
.ela_level.resample_method
: Resample technique for training the model, cf.ResampleDesc
. The default is"CV"
.ela_level.resample_iterations
: Number of iterations of the resampling method. The default is10
.ela_level.resample_info
: Should information regarding the resampling be printed? The default isFALSE
.ela_level.parallelize
: Should the levelset features be computed in parallel? The default isFALSE
.ela_level.parallel.mode
: Which mode should be used for the parallelized computation? Possible options are"local"
,"multicore"
,"socket"
(default),"mpi"
and"BatchJobs"
. Note that in case you are using a windows computer you can only use the"socket"
mode.ela_level.parallel.cpus
: On how many cpus do you want to compute the features in parallel? Per default, all available cpus are used.ela_level.parallel.level
: On which level should the parallel computation be performed? The default is"mlr.resample"
, i.e. the internal resampling (performed usingmlr
) will be done in parallel.ela_level.parallel.logging
: Should slave output be logged? The default isFALSE
.ela_level.parallel.show_info
: Should verbose output of function calls be printed on the console? The default isFALSE
.
ELA local search features:
ela_local.local_searches
: Number of local searches. The default is50 * d
withd
being the number of features (i.e. the dimension).ela_local.optim_method
: Local search algorithm. The default is"L-BFGS-B"
.ela_local.optim.{lower, upper}
: Lower and upper bounds to be considered by the local search algorithm. Per default, the boundaries are the same as defined within the feature object (in case of"L-BFGS-B"
) or infinity (for all others).ela_local.optim_method_control
: Control settings of the local search algorithm. The default is an empty list.ela_local.sample_seed
: Seed, which will be set before the selection of the initial start points of the local search. The default issample(1:1e6, 1)
.ela_local.clust_method
: Once the local searches converge, basins have to be assigned. This is done using hierarchical clustering methods fromhclust
. The default is"single"
, i.e. single linkage clustering.ela_local.clust_cut_function
: A function of a hierarchical clusteringcl
, which defines at which height the dendrogramm should be splitted into clusters (cf.cutree
). The default isfunction(cl) as.numeric(quantile(cl$height, 0.1))
, i.e. the10%
-quantile of all the distances between clusters.
GCM features:
gcm.approaches
: Which approach(es) should be used when computing the representatives of a cell. The default are all three approaches, i.e.c("min", "mean", "near")
.gcm.cf_power
: Theoretically, we need to compute the canonical form to the power of infinity. However, we use this value as approximation of infinity. The default is256
.
barrier tree features:
gcm.approaches
: Which approach(es) should be used when computing the representatives of a cell. The default are all three approaches, i.e.c("min", "mean", "near")
.gcm.cf_power
: Theoretically, we need to compute the canonical form to the power of infinity. However, we use this value as approximation of infinity. The default is256
.bt.base
: Maximum number of basins, which are joined at a single breakpoint. The default is4L
.bt.max_depth
: Maximum number of levels of the barrier tree. The default is16L
.
information content features:
ic.epsilon
: Epsilon values as described in section V.A of Munoz et al. (2015). The default isc(0, 10^(seq(-5, 15, length.out = 1000))
.ic.sorting
: Sorting strategy, which is used to define the tour through the landscape. Possible values are"nn"
(= default) and"random"
.ic.sample.generate
: Should the initial design be created using a LHS? The default isFALSE
, i.e. the initial design from the feature object will be used.ic.sample.dimensions
: Dimensions of the initial design, if created using a LHS. The default isfeat.object$dimension
.ic.sample.size
: Size of the initial design, if created using a LHS. The default is100 * feat.object$dimension
.ic.sample.lower
: Lower bounds of the initial design, if created with a LHS. The default is100 * feat.object$lower
.ic.sample.upper
: Upper bounds of the initial design, if created with a LHS. The default is100 * feat.object$upper
.ic.aggregate_duplicated
: How should observations, which have duplicates in the decision space, be aggregated? The default ismean
.ic.show_warnings
: Should warnings be shown, when possible duplicates are removed? The default isFALSE
.ic.seed
: Possible seed, which can be used for making your experiments reproducable. Per default, a random number will be drawn as seed.ic.nn.start
: Which observation should be used as starting value, when exploring the landscape with the nearest neighbour approach. The default is a randomly chosen integer value.ic.nn.neighborhood
: In order to provide a fast computation of the features, we useRANN::nn2
for computing the nearest neighbors of an observation. Per default, we consider the20L
closest neighbors for finding the nearest not-yet-visited observation. If all of those neighbors have been visited already, we compute the distances to the remaining points separately.ic.settling_sensitivity
: Threshold, which should be used for computing the “settling sensitivity”. The default is0.05
(as used in the corresponding paper).ic.info_sensitivity
: Portion of partial information sensitivity. The default is0.5
(as used in the paper).
dispersion features:
disp.quantiles
: Quantiles, which should be used for defining the "best" elements of the entire initial design. The default isc(0.02, 0.05, 0.1, 0.25)
.disp.dist_method
: Which distance method should be used for computing the distance between two observations? All methods ofdist
are possible options with"euclidean"
being the default.disp.minkowski_p
: Value ofp
in casedist_meth
is"minkowski"
. The default is2
, i.e. the euclidean distance.
nearest better clustering features:
nbc.dist_method
: Which distance method should be used for computing the distance between two observations? All methods ofdist
are possible options with"euclidean"
being the default.nbc.minkowski_p
: Value ofp
in casedist_meth
is"minkowski"
. The default is2
, i.e. the euclidean distance.nbc.dist_tie_breaker
: How will ties be broken when different observations have the same distance to an observation? Possible values are"sample"
,"first"
and"last"
. The default is"sample"
.nbc.cor_na
: How should NA's be handled when computing correlations? Any method from the argumentuse
of the functioncor
is possible. The default is"pairwise.complete.obs"
.nbc.fast_k
: In case of euclidean distances, the method can find neighbours faster. This parameter controls the percentage of observations that should be considered when looking for the nearest better neighbour, i.e. the nearest neighbour with a better objective value. The default is0.05
, i.e. the 5
principal component features:
pca.{cov, cor}_{x, init}
: Which proportion of the variance should be explained by the principal components given a principal component analysis based on the covariance / correlation matrix of the decision space (x
) or the entire initial design (init
)? The defaults are0.9
.
Value
list
of (numeric
) features:
cm_angle
– angle features (10):
These features are based on the location of the worst and best element within each cell. To be precise, their distance to the cell center and the angle between these three elements (at the center) are the foundation:dist_ctr2{best, worst}.{mean, sd}
: arithmetic mean and standard deviation of distances from the cell center to the best / worst observation within the cell (over all cells)angle.{mean, sd}
: arithmetic mean and standard deviation of angles (in degree) between worst, center and best element of a cell (over all cells)y_ratio_best2worst.{mean, sd}
: arithmetic mean and standard deviation of the ratios between the distance of the worst and best element within a cell and the worst and best element in the entire initial design (over all cells);
note that the distances are only measured in the objective spacecosts_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
cm_conv
– cell mapping convexity features (6):
Each cell will be represented by an observation (of the initial design), which is located closest to the cell center. Then, the objectives of three neighbouring cells are compared:
{convex, concave}.hard
: if the objective of the inner cell is above / below the two outer cells, there is strong evidence for convexity / concavity{convex, concave}.soft
: if the objective of the inner cell is above / below the arithmetic mean of the two outer cells, there is weak evidence for convexity / concavitycosts_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
cm_grad
– gradient homogeneity features (4):
Within a cell of the initial grid, the gradients between each observation and its nearest neighbour observation are computed. Those gradients are then directed towards the smaller of the two objective values and afterwards normalized. Then, the length of the sum of all the directed and normalized gradients within a cell is computed. Based on those measurements (one per cell) the following features are computed:
{mean, sd}
: arithmetic mean and standard deviation of the aforementioned lengthscosts_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
ela_conv
– ELA convexity features (6):
Two observations are chosen randomly from the initial design. Then, a linear (convex) combination of those observations is calculated – based on a random weight from [0, 1]. The corresponding objective value will be compared to the linear combination of the objectives from the two original observations. This process is replicatedconvex.nsample
(per default1000
) times and will then be aggregated:
{convex_p, linear_p}
: percentage of convexity / linearitylinear_dev.{orig, abs}
: average (original / absolute) deviation between the linear combination of the objectives and the objective of the linear combination of the observationscosts_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
ela_curv
– ELA curvature features (26):
Given a feature object,curv.sample_size
samples (per default100 * d
withd
being the number of features) are randomly chosen. Then, the gradient and hessian of the function are estimated based on those points and the following features are computed:
grad_norm.{min, lq, mean, median, uq, max, sd, nas}
: aggregations (minimum, lower quartile, arithmetic mean, median, upper quartile, maximum, standard deviation and percentage of NAs) of the gradients' lengthsgrad_scale.{min, lq, mean, median, uq, max, sd, nas}
: aggregations of the ratios between biggest and smallest (absolute) gradient directionshessian_cond.{min, lq, mean, median, uq, max, sd, nas}
: aggregations of the ratios of biggest and smallest eigenvalue of the hessian matricescosts_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
ela_distr
– ELA y-distribution features (5):
skewness
: skewness of the objective valueskurtosis
: kurtosis of the objective valuesnumber_of_peaks
: number of peaks based on an estimation of the density of the objective valuescosts_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
ela_level
– ELA levelset features (20):
mmce_{methods}_{quantiles}
: mean misclassification error of each pair of classification method and quantile{method1}_{method2}_{quantiles}
: ratio of all pairs of classification methods for all quantilescosts_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
ela_local
– ELA local search features (16):
Based on some randomly chosen points from the initial design, a pre-defined number of local searches (ela_local.local_searches
) are executed. Their optima are then clustered (using hierarchical clustering), assuming that local optima that are located close to each other, likely belong to the same basin. Given those basins, the following features are computed:
n_loc_opt.{abs, rel}
: the absolute / relative amount of local optimabest2mean_contr.orig
: each cluster is represented by its center; this feature is the ratio of the objective values of the best and average clusterbest2mean_contr.ratio
: each cluster is represented by its center; this feature is the ratio of the differences in the objective values of average to best and worst to best clusterbasin_sizes.avg_{best, non_best, worst}
: average basin size of the best / non-best / worst cluster(s)fun_evals.{min, lq, mean, median, uq, max, sd}
: aggregations of the performed local searchescosts_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
ela_meta
– ELA meta model features (11):
Given an initial design, linear and quadratic models of the formobjective ~ features
are created. Both versions are created with and without simple interactions (e.g.,x1:x2
). Based on those models, the following features are computed:
lin_simple.{adj_r2, intercept}
: adjusted R^2 (i.e. model fit) and intercept of a simple linear modellin_simple.coef.{min, max, max_by_min}
: smallest and biggest (non-intercept) absolute coefficients of the simple linear model, and their ratio{lin_w_interact, quad_simple, quad_w_interact}.adj_r2
: adjusted R^2 (i.e. the model fit) of a linear model with interactions, and a quadratic model with and without interactionsquad_simple.cond
: condition of a simple quadratic model (without interactions), i.e. the ratio of its (absolute) biggest and smallest coefficientscosts_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
gcm
– general cell mapping (GCM) features (75):
Computes general cell mapping features based on the Generalized Cell Mapping (GCM) approach, which interpretes the cells as absorbing Markov chains. Computations are performed based on three different approaches: taking the best (min
) or average (mean
) objective value of a cell or the closest observation (near
) to a cell as representative. For each of these approaches the following 25 features are computed:
attractors, pcells, tcells, uncertain
: relative amount of attractor, periodic, transient and uncertain cellsbasin_prob.{min, mean, median, max, sd}
: aggregations of the probabilities of each basin of attractionbasin_certain.{min, mean, median, max, sd}
: aggregations of the (relative) size of each basin of attraction, in case only certain cells are considered (i.e. cells, which only point towards one attractor)basin_uncertain.{min, mean, median, max, sd}
: aggregations of the (relative) size of each basin of attraction, in case uncertain cells are considered (i.e. a cell, which points to multiple attractors contributes to each of its basins)best_attr.{prob, no}
: probability of finding the attractor with the best objective value and the (relative) amount of those attractors (i.e. the ratio of the number of attractors with the best objective value and the total amount of cells)costs_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
bt
– barrier tree features (90):
Computes barrier tree features, based on a Generalized Cell Mapping (GCM) approach. Computations are performed based on three different approaches: taking the best (min
) or average (mean
) objective value of a cell or the closest observation (near
) to a cell as representative. For each of these approaches the following 31 features are computed:
levels
: absolute number of levels of the barrier treeleaves
: absolute number of leaves (i.e. local optima) of the barrier treedepth
: range between highest and lowest node of the treedepth_levels_ratio
: ratio of depth and levelslevels_nodes_ratio
: ratio of number of levels and number of (non-root) nodes of the treediffs.{min, mean, median, max, sd}
: aggregations of the height differences between a node and its predecessorlevel_diffs.{min, mean, median, max, sd}
: aggregations of the average height differences per levelattractor_dists.{min, mean, median, max, sd}
: aggregations of the (euclidean) distances between the local and global best cells (attractors)basin_ratio.{uncertain, certain, most_likely}
: ratios of maximum and minimum size of the basins of attractions; here, a cell might belong to different attractors (uncertain), exactly one attractor (certain) or the attractor with the highest probabilitybasin_intersection.{min, mean, median, max, sd}
: aggregations of the intersection between the basin of the global best value and the basins of all local best valuesbasin_range
: range of a basin (euclidean distance of widest range per dimension)costs_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
ic
– information content features (7):
Computes features based on the Information Content of Fitness Sequences (ICoFiS) approach (cf. Munoz et al., 2015). In this approach, the information content of a continuous landscape, i.e. smoothness, ruggedness, or neutrality, are quantified. While common analysis methods were able to calculate the information content of discrete landscapes, the ICoFiS approach provides an adaptation to continuous landscapes that accounts e.g. for variable step sizes in random walk sampling:
h.max
: “maximum information content” (entropy) of the fitness sequence, cf. equation (5)eps.s
: “settling sensitivity”, indicating the epsilon for which the sequence nearly consists of zeros only, cf. equation (6)eps.max
: similar toeps.s
, but in contrast to the formereps.max
guarantees non-missing values; this simply is the epsilon-value for which H(eps.max
) ==h.max
eps.ratio
: “ratio of partial information sensitivity”, cf. equation (8), where the ratio is0.5
m0
: “initial partial information”, cf. equation (7)costs_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
basic
– basic features (15):
Very simple features, which can be read from the feature object (without any computational efforts):
{dim, observations}
: number of features / dimensions and observations within the initial sample{lower, upper, objective, blocks}_{min, max}
: minimum and maximum value of all lower and upper bounds, the objective values and the number of blocks / cells (per dimension)cells_{filled, total}
: number of filled (i.e. non-empty) cells and total number of cells{minimize_fun}
: logical value, indicating whether the optimization function should be minimizedcosts_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
disp
– dispersion features (18):
Computes features based on the comparison of the dispersion of pairwise distances among the 'best' elements and the entire initial design:
{ratio, diff}_{mean, median}_{02, 05, 10, 25}
: ratio and difference of the mean / median distances of the distances of the 'best' objectives vs. 'all' objectivescosts_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
limo
– linear model features (14):
Linear models are computed per cell, provided the decision space is divided into a grid of cells. Each one of the models has the formobjective ~ features
.
avg_length.{reg, norm}
: length of the average coefficient vector (based on regular and normalized vectors)length_{mean, sd}
: arithmetic mean and standard deviation of the lengths of all coefficient vectorscor.{reg, norm}
: correlation of all coefficient vectors (based on regular and normalized vectors)ratio_{mean, sd}
: arithmetic mean and standard deviation of the ratios of (absolute) maximum and minimum (non-intercept) coefficients per cellsd_{ratio, mean}.{reg, norm}
: max-by-min-ratio and arithmetic mean of the standard deviations of the (non-intercept) coefficients (based on regular and normalized vectors)costs_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
nbc
– nearest better (clustering) features (7):
Computes features based on the comparison of nearest neighbour and nearest better neighbour, i.e., the nearest neighbor with a better performance / objective value value.
nn_nb.{sd, mean}_ratio
: ratio of standard deviations and arithmetic mean based on the distances among the nearest neighbours and the nearest better neighboursnn_nb.cor
: correlation between distances of the nearest neighbours and the distances of the nearest better neighboursdist_ratio.coeff_var
: coefficient of variation of the distance ratiosnb_fitness.cor
: correlation between fitness value and count of observations to whom the current observation is the nearest better neighbour (the so-called “indegree”).costs_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
pca
– principal component (analysis) features (10):
expl_var.{cov, cor}_{x, init}
: proportion of the explained variance when applying PCA to the covariance / correlation matrix of the decision space (x
) or the entire initial design (init
)expl_var_PC1.{cov, cor}_{x, init}
: proportion of variance, which is explained by the first principal component – when applying PCA to the covariance / correlation matrix of the decision space (x
) or the entire initial designcosts_{fun_evals, runtime}
: number of (additional) function evaluations and runtime (in seconds), which were needed for the computation of these features
References
Kerschke, P., and Trautmann, H. (2019): “Comprehensive Feature-Based Landscape Analysis of Continuous and Constrained Optimization Problems Using the R-package flacco”, in: Applications in Statistical Computing – From Music Data Analysis to Industrial Quality Improvement, pp. 93-123, Springer. (https://link.springer.com/chapter/10.1007/978-3-030-25147-5_7).
Kerschke, P., Preuss, M., Hernandez, C., Schuetze, O., Sun, J.-Q., Grimme, C., Rudolph, G., Bischl, B., and Trautmann, H. (2014): “Cell Mapping Techniques for Exploratory Landscape Analysis”, in: EVOLVE – A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation V, pp. 115-131 (http://dx.doi.org/10.1007/978-3-319-07494-8_9).
Kerschke, P., Preuss, M., Wessing, S., and Trautmann, H. (2015): “Detecting Funnel Structures by Means of Exploratory Landscape Analysis”, in: Proceedings of the 17th Annual Conference on Genetic and Evolutionary Computation (GECCO '15), pp. 265-272 (http://dx.doi.org/10.1145/2739480.2754642).
Lunacek, M., and Whitley, D. (2006): “The dispersion metric and the CMA evolution strategy”, in: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (GECCO '06), pp. 477-484 (http://dx.doi.org/10.1145/1143997.1144085).
Mersmann, O., Bischl, B., Trautmann, H., Preuss, M., Weihs, C., and Rudolph, G. (2011): “Exploratory Landscape Analysis”, in: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (GECCO '11), pp. 829-836 (http://dx.doi.org/10.1145/2001576.2001690).
Munoz, M. A., Kirley, M., and Halgamuge, S. K. (2015): “Exploratory Landscape Analysis of Continuous Space Optimization Problems Using Information Content”, in: IEEE Transactions on Evolutionary Computation (19:1), pp. 74-87 (http://dx.doi.org/10.1109/TEVC.2014.2302006).
Examples
# (1) create a feature object:
X = t(replicate(n = 2000, expr = runif(n = 5, min = -10, max = 10)))
## Not run: feat.object = createFeatureObject(X = X, fun = function(x) sum(x^2))
# (2) compute all non-cellmapping features
ctrl = list(allow_cellmapping = FALSE)
## Not run: features = calculateFeatures(feat.object, control = ctrl)
# (3) in order to allow the computation of the cell mapping features, one
# has to provide a feature object that has knowledge about the number of
# cells per dimension:
f = function(x) sum(x^2)
feat.object = createFeatureObject(X = X, fun = f, blocks = 3)
## Not run: features = calculateFeatures(feat.object)
# (4) if you want to compute a specific feature set, you can use
# calculateFeatureSet:
features.angle = calculateFeatureSet(feat.object, "cm_angle")
# (5) as noted in the details, it might be useful to compute the levelset
# features parallelized:
## Not run:
library(parallelMap)
library(parallel)
n.cores = detectCores()
parallelStart(mode = "socket", cpus = n.cores,
logging = FALSE, show.info = FALSE)
system.time((levelset.par = calculateFeatureSet(feat.object, "ela_level")))
parallelStop()
system.time((levelset.seq = calculateFeatureSet(feat.object, "ela_level")))
## End(Not run)