stops {stops} | R Documentation |
stops: structure optimized proximity scaling
Description
A package for "structure optimized proximity scaling" (STOPS), a collection of methods that fit nonlinear distance transformations in multidimensional scaling (MDS) and trade-off the fit with structure considerations to find optimal parameters or optimal configurations. The package contains various functions, wrappers, methods and classes for fitting, plotting and displaying different MDS models in a STOPS framework like Torgerson scaling, SMACOF, Sammon mapping, elastic scaling, symmetric SMACOF, spherical SMACOF, sstress, rstress, powermds, power elastic scaling, power sammon mapping, power stress, Isomap, approximate power stress, restricted power stress. All of these models can also be fit as MDS variants (i.e., no structuredness). The package further contains functions for optimization (Adaptive Luus-Jaakola and for Bayesian optimization with treed Gaussian process with jump to linear models) and functions for various structuredness indices
This allows to fit STOPS models as described in Rusch, Mair, Hornik (2023).
Usage
stops(
dis,
loss = c("strain", "stress", "smacofSym", "powerstress", "powermds", "powerelastic",
"powerstrain", "elastic", "sammon", "sammon2", "smacofSphere", "powersammon",
"rstress", "sstress", "isomap", "isomapeps", "bcstress", "lmds", "apstress",
"rpowerstress"),
theta = 1,
structures = c("cclusteredness", "clinearity", "cdependence", "cmanifoldness",
"cassociation", "cnonmonotonicity", "cfunctionality", "ccomplexity", "cfaithfulness",
"cregularity", "chierarchy", "cconvexity", "cstriatedness", "coutlying",
"cskinniness", "csparsity", "cstringiness", "cclumpiness", "cinequality"),
ndim = 2,
weightmat = NULL,
init = NULL,
stressweight = 1,
strucweight,
strucpars,
optimmethod = c("SANN", "ALJ", "pso", "Kriging", "tgp", "DIRECT", "stogo", "cobyla",
"crs2lm", "isres", "mlsl", "neldermead", "sbplx", "hjk", "cmaes"),
lower,
upper,
verbose = 0,
type = c("additive", "multiplicative"),
initpoints = 10,
itmax = 50,
itmaxps = 10000,
model,
control,
...
)
Arguments
dis |
numeric matrix or dist object of a matrix of proximities |
loss |
which loss function to be used for fitting, defaults to stress. |
theta |
hyperparameter vector starting values for the transformation functions. If the length is smaller than the number of hyperparameters for the MDS version the vector gets recycled (see the corresponding stop_XXX function or the vignette for how theta must look like exactly for each loss). If larger than the number of hyperparameters for the MDS method, an error is thrown. If completely missing theta is set to 1 and recycled. |
structures |
character vector of which c-structuredness indices should be considered; if missing no structure is considered. |
ndim |
number of dimensions of the target space |
weightmat |
(optional) a matrix of nonnegative weights; defaults to 1 for all off diagonals |
init |
(optional) initial configuration |
stressweight |
weight to be used for the fit measure; defaults to 1 |
strucweight |
vector of weights to be used for the c-structuredness indices (in the same order as in structures); defaults to -1/length(structures) for each index |
strucpars |
(possibly named with the structure). Metaparameters for the structuredness indices (gamma in the article). It's safest for it be a list of lists with the named arguments for the structuredness indices and the order of the lists must be like the order of structures. So something like this |
optimmethod |
What solver to use. Currently supported are Bayesian optimization with Gaussian Process priors and Kriging ("Kriging"), Bayesian optimization with treed Gaussian processes with jump to linear models ("tgp"), Adaptive LJ Search ("ALJ"), Particle Swarm optimization ("pso"), simulated annealing ("SANN"), "DIRECT", Stochastic Global Optimization ("stogo"), COBYLA ("cobyla"), Controlled Random Search 2 with local mutation ("crs2lm"), Improved Stochastic Ranking Evolution Strategy ("isres"), Multi-Level Single-Linkage ("mlsl"), Nelder-Mead ("neldermead"), Subplex ("sbplx"), Hooke-Jeeves Pattern Search ("hjk"), CMA-ES ("cmaes"). Defaults to "ALJ" version. tgp, ALJ, Kriging and pso usually work well for relatively low values of itmax. |
lower |
The lower contraints of the search region. Needs to be a numeric vector of the same length as the parameter vector theta. |
upper |
The upper contraints of the search region. Needs to be a numeric vector of the same length as the parameter vector theta. |
verbose |
numeric value hat prints information on the fitting process; >2 is very verbose. |
type |
which aggregation for the multi objective target function? Either 'additive' (default) or 'multiplicative' |
initpoints |
number of initial points to fit the surrogate model for Bayesian optimization; default is 10. |
itmax |
maximum number of iterations of the outer optimization (for theta) or number of steps of Bayesian optimization; default is 50. We recommend a higher number for ALJ (around 150). Note that due to the inner workings of some solvers, this may or may not correspond to the actual number of function evaluations performed (or PS models fitted). E.g., with tgp the actual number of function evaluation of the PS method is between itmax and 6*itmax as tgp samples 1-6 candidates from the posterior and uses the best candidate. For pso it is the number of particles s times itmax. For cmaes it is usually a bit higher than itmax. This currently may get overruled by a control argument if it is used (and then set to either ewhat is supplie dby control or to the default of the method). |
itmaxps |
maximum number of iterations of the inner optimization (to obtain the PS configuration) |
model |
a character specifying the surrogate model to use. For Kriging it specifies the covariance kernel for the GP prior; see |
control |
a control argument passed to the outer optimization procedure. Will override any other control arguents passed, especially verbose and itmax. For the efect of control, see the functions pomp::sannbox for SANN and pso::psoptim for pso, cmaes::cma_es for cmaes, dfoptim::hjkb for hjk and the nloptr docs for the algorithms DIRECT, stogo, cobyla, crs2lm, isres, mlsl, neldermead, sbplx. |
... |
additional arguments passed to the outer optimization procedures (not fully tested). |
Details
The stops package provides five categories of important functions:
Models & Algorithms:
stops() ... which fits STOPS models as described in Rusch et al. (2023). By setting cordweight or strucweight to zero they can also be used to fit metric MDS for many different models, see below.
powerStressMin()... a workhorse for fitting many stresses, including s-stress, r-stress (De Leeuw, 2014), Sammon mapping with power transformations (powersammon), elastic scaling with power transformation (powerelastic), power stress. They can most conveniently be accessed via the stops functions and setting stressweight=1 and cordweight or strucweight=0 or by the dedicated functions starting with stop_foo where foo is the method and setting stressweight=1 and strucweight=0. It uses the nested majorization algorithm for r-stress of De Leeuw(2014).
bcStressMin()... a workhorse for fitting Box-Cox stress (Chen & Buja, 2013).
lmds()... a workhorse for the local MDS of Chen & Buja (2008).
Structuredness Indices: Various c-structuredness as c_foo(), where foo is the name of the structuredness. See Rusch et al. (2023).
Optimization functions:
ljoptim() ... An (adaptive) version of the Luus-Jakola random search
Wrappers and convenience functions:
conf_adjust(): procrustes adjustment of configurations
cmdscale(), sammon(): wrappers that return S3 objects
stop_smacofSym(), stop_sammon(), stop_cmdscale(), stop_rstress(), stop_powerstress(),stop_smacofSphere(), stop_sammon2(), stop_elastic(), stop_sstress(), stop_powerelastic(), stop_powersammon(), stop_powermds(), stop_isomap(), stop_isomapeps(), stop_bcstress(), stop_lmds(), stop_apstress(),stops_rpowerstress(): stop versions of these MDS models.
stoploss() ... a function to calculate stoploss (Rusch et al., 2023)
Methods: For most of the objects returned by the high-level functions S3 classes and methods for standard generics were implemented, including print, summary, plot, plot3d, plot3dstatic.
References:
Rusch, T., Mair, P., & Hornik, K. (2023). Structure-based hyperparameter selection with Bayesian optimization in multidimensional scaling. Statistics & Computing, 33, [28]. https://doi.org/10.1007/s11222-022-10197-w
Authors: Thomas Rusch, Lisha Chen, Jan de Leeuw, Patrick Mair, Kurt Hornik
Maintainer: Thomas Rusch
The combination of c-structurednes indices and stress uses the stress.m values, which are the explictly normalized stresses. Reported however is the stress-1 value which is sqrt(stress.m).
Value
A list with the components
stoploss: the stoploss value
optim: the object returned from the optimization procedure
stressweight: the stressweight
strucweight: the vector of structure weights
call: the call
optimmethod: The solver selected
losstype: The PS badness-of-fit function
nobj: the number of objects in the configuration
type: The type of stoploss scalacrisation (additive or multiplicative)
fit: The fitted PS object (most importantly $fit$conf the fitted configuration)
Examples
data(kinshipdelta,package="smacof")
strucpars<-list(list(epsilon=10,minpts=2,scale=3),list(NULL))
dissm<-as.matrix(kinshipdelta)
#STOPS with strain
resstrain<-stops(dissm,loss="strain",theta=1,structures=c("cclusteredness","cdependence"),
strucpars=strucpars,optimmethod="ALJ",lower=0,upper=10,itmax=10)
resstrain
summary(resstrain)
plot(resstrain)
#STOPS with stress
strucpars<-list(list(epsilon=10,minpts=2,scale=3),NULL)
resstress<-stops(dissm,loss="stress",
structures=c("cclusteredness","cdependence"),
strucpars=strucpars,optimmethod="ALJ",lower=0,upper=10)
resstress
summary(resstress)
plot(resstress)
plot(resstress,"Shepard")
#STOPS with powerstress
respstress<-stops(dissm,loss="powerstress",
structures=c("cclusteredness","cdependence"),
strucpars=strucpars,weightmat=dissm,
itmaxps=1000,optimmethod="ALJ",lower=c(0,0,1),upper=c(10,10,10))
respstress
summary(respstress)
plot(respstress)
#STOPS with bcstress
resbcstress<-stops(dissm,loss="bcstress",
structures=c("cclusteredness","cdependence"),
strucpars=strucpars,optimmethod="ALJ",lower=c(0,1,0),upper=c(10,10,10))
resbcstress
summary(resbcstress)
plot(resbcstress)
#STOPS with lmds
reslmds<-stops(dissm,loss="lmds",
structures=c("cclusteredness","clinearity"),
strucpars=strucpars,optimmethod="ALJ",lower=c(2,0),upper=c(10,2))
reslmds
summary(reslmds)
plot(reslmds)
#STOPS with Isomap (the epsilon version)
resiso<-stops(dissm,loss="isomapeps",
structures=c("cclusteredness","clinearity"),
strucpars=strucpars,optimmethod="ALJ",lower=70,upper=120)
resiso
summary(resiso)
plot(resiso)
data(kinshipdelta,package="smacof")
strucpar<-list(NULL,NULL) #parameters for indices
res1<-stops(kinshipdelta,loss="stress",
structures=c("cclumpiness","cassociation"),strucpars=strucpar,
lower=0,upper=10,itmax=10)
res1
data(BankingCrisesDistances)
strucpar<-list(c(epsilon=10,minpts=2),NULL) #parameters for indices
res1<-stops(BankingCrisesDistances[,1:69],loss="stress",verbose=0,
structures=c("cclusteredness","clinearity"),strucpars=strucpar,
lower=0,upper=10)
res1
strucpar<-list(list(alpha=0.6,C=15,var.thr=1e-5,zeta=NULL),
list(alpha=0.6,C=15,var.thr=1e-5,zeta=NULL))
res1<-stops(BankingCrisesDistances[,1:69],loss="stress",verbose=0,
structures=c("cfunctionality","ccomplexity"),strucpars=strucpar,
lower=0,upper=10)
res1