stops {stops}R Documentation

stops: structure optimized proximity scaling

Description

A package for "structure optimized proximity scaling" (STOPS), a collection of methods that fit nonlinear distance transformations in multidimensional scaling (MDS) and trade-off the fit with structure considerations to find optimal parameters or optimal configurations. The package contains various functions, wrappers, methods and classes for fitting, plotting and displaying different MDS models in a STOPS framework like Torgerson scaling, SMACOF, Sammon mapping, elastic scaling, symmetric SMACOF, spherical SMACOF, sstress, rstress, powermds, power elastic scaling, power sammon mapping, power stress, Isomap, approximate power stress, restricted power stress. All of these models can also be fit as MDS variants (i.e., no structuredness). The package further contains functions for optimization (Adaptive Luus-Jaakola and for Bayesian optimization with treed Gaussian process with jump to linear models) and functions for various structuredness indices

This allows to fit STOPS models as described in Rusch, Mair, Hornik (2023).

Usage

stops(
  dis,
  loss = c("strain", "stress", "smacofSym", "powerstress", "powermds", "powerelastic",
    "powerstrain", "elastic", "sammon", "sammon2", "smacofSphere", "powersammon",
    "rstress", "sstress", "isomap", "isomapeps", "bcstress", "lmds", "apstress",
    "rpowerstress"),
  theta = 1,
  structures = c("cclusteredness", "clinearity", "cdependence", "cmanifoldness",
    "cassociation", "cnonmonotonicity", "cfunctionality", "ccomplexity", "cfaithfulness",
    "cregularity", "chierarchy", "cconvexity", "cstriatedness", "coutlying",
    "cskinniness", "csparsity", "cstringiness", "cclumpiness", "cinequality"),
  ndim = 2,
  weightmat = NULL,
  init = NULL,
  stressweight = 1,
  strucweight,
  strucpars,
  optimmethod = c("SANN", "ALJ", "pso", "Kriging", "tgp", "DIRECT", "stogo", "cobyla",
    "crs2lm", "isres", "mlsl", "neldermead", "sbplx", "hjk", "cmaes"),
  lower,
  upper,
  verbose = 0,
  type = c("additive", "multiplicative"),
  initpoints = 10,
  itmax = 50,
  itmaxps = 10000,
  model,
  control,
  ...
)

Arguments

dis

numeric matrix or dist object of a matrix of proximities

loss

which loss function to be used for fitting, defaults to stress.

theta

hyperparameter vector starting values for the transformation functions. If the length is smaller than the number of hyperparameters for the MDS version the vector gets recycled (see the corresponding stop_XXX function or the vignette for how theta must look like exactly for each loss). If larger than the number of hyperparameters for the MDS method, an error is thrown. If completely missing theta is set to 1 and recycled.

structures

character vector of which c-structuredness indices should be considered; if missing no structure is considered.

ndim

number of dimensions of the target space

weightmat

(optional) a matrix of nonnegative weights; defaults to 1 for all off diagonals

init

(optional) initial configuration

stressweight

weight to be used for the fit measure; defaults to 1

strucweight

vector of weights to be used for the c-structuredness indices (in the same order as in structures); defaults to -1/length(structures) for each index

strucpars

(possibly named with the structure). Metaparameters for the structuredness indices (gamma in the article). It's safest for it be a list of lists with the named arguments for the structuredness indices and the order of the lists must be like the order of structures. So something like this list(list(par1Struc1=par1Struc1,par2Struc1=par2Struc1),list(par1Struc2=par1Struc2,par2Struc2=par2Struc2),...) where parYStrucX are the named arguments for the metaparameter Y of the structure X the list elements corresponds to. For a structure without parameters, set NULL. Parameters in different list elements parYStrucX can have the same name. For example, say we want to use cclusteredness with metaparameters epsilon=10 and k=4 (and the default for the other parameters), cdependence with no metaparameters and cfaithfulness with metaparameter k=7 one would list(list(epsilon=10,k=4),list(NULL),list(dis=obdiss,k=6)) for structures vector ("cclusteredness","cdependence","cfaithfulness"). The parameter lists must be in the same ordering as the indices in structures. If missing it is set to NULL and defaults are used. It is also possible to supply a structure's metaparameters as a list of vectors with named elements if the metaparameters are scalars, so like list(c(par1Struc1=parStruc1,par2Struc1=par1Struc1,...),c(par1Struc2=par1Struc2,par2Struc2=par2Struc2,...)). That can have unintended consequences if the metaparameter is a vector or matrix.

optimmethod

What solver to use. Currently supported are Bayesian optimization with Gaussian Process priors and Kriging ("Kriging"), Bayesian optimization with treed Gaussian processes with jump to linear models ("tgp"), Adaptive LJ Search ("ALJ"), Particle Swarm optimization ("pso"), simulated annealing ("SANN"), "DIRECT", Stochastic Global Optimization ("stogo"), COBYLA ("cobyla"), Controlled Random Search 2 with local mutation ("crs2lm"), Improved Stochastic Ranking Evolution Strategy ("isres"), Multi-Level Single-Linkage ("mlsl"), Nelder-Mead ("neldermead"), Subplex ("sbplx"), Hooke-Jeeves Pattern Search ("hjk"), CMA-ES ("cmaes"). Defaults to "ALJ" version. tgp, ALJ, Kriging and pso usually work well for relatively low values of itmax.

lower

The lower contraints of the search region. Needs to be a numeric vector of the same length as the parameter vector theta.

upper

The upper contraints of the search region. Needs to be a numeric vector of the same length as the parameter vector theta.

verbose

numeric value hat prints information on the fitting process; >2 is very verbose.

type

which aggregation for the multi objective target function? Either 'additive' (default) or 'multiplicative'

initpoints

number of initial points to fit the surrogate model for Bayesian optimization; default is 10.

itmax

maximum number of iterations of the outer optimization (for theta) or number of steps of Bayesian optimization; default is 50. We recommend a higher number for ALJ (around 150). Note that due to the inner workings of some solvers, this may or may not correspond to the actual number of function evaluations performed (or PS models fitted). E.g., with tgp the actual number of function evaluation of the PS method is between itmax and 6*itmax as tgp samples 1-6 candidates from the posterior and uses the best candidate. For pso it is the number of particles s times itmax. For cmaes it is usually a bit higher than itmax. This currently may get overruled by a control argument if it is used (and then set to either ewhat is supplie dby control or to the default of the method).

itmaxps

maximum number of iterations of the inner optimization (to obtain the PS configuration)

model

a character specifying the surrogate model to use. For Kriging it specifies the covariance kernel for the GP prior; see covTensorProduct-class defaults to "powerexp". For tgp it specifies the non stationary process used see bgp, defaults to "btgpllm"

control

a control argument passed to the outer optimization procedure. Will override any other control arguents passed, especially verbose and itmax. For the efect of control, see the functions pomp::sannbox for SANN and pso::psoptim for pso, cmaes::cma_es for cmaes, dfoptim::hjkb for hjk and the nloptr docs for the algorithms DIRECT, stogo, cobyla, crs2lm, isres, mlsl, neldermead, sbplx.

...

additional arguments passed to the outer optimization procedures (not fully tested).

Details

The stops package provides five categories of important functions:

Models & Algorithms:

Structuredness Indices: Various c-structuredness as c_foo(), where foo is the name of the structuredness. See Rusch et al. (2023).

Optimization functions:

Wrappers and convenience functions:

Methods: For most of the objects returned by the high-level functions S3 classes and methods for standard generics were implemented, including print, summary, plot, plot3d, plot3dstatic.

References:

Authors: Thomas Rusch, Lisha Chen, Jan de Leeuw, Patrick Mair, Kurt Hornik

Maintainer: Thomas Rusch

The combination of c-structurednes indices and stress uses the stress.m values, which are the explictly normalized stresses. Reported however is the stress-1 value which is sqrt(stress.m).

Value

A list with the components

Examples

data(kinshipdelta,package="smacof")

strucpars<-list(list(epsilon=10,minpts=2,scale=3),list(NULL))
dissm<-as.matrix(kinshipdelta)

#STOPS with strain
resstrain<-stops(dissm,loss="strain",theta=1,structures=c("cclusteredness","cdependence"),
strucpars=strucpars,optimmethod="ALJ",lower=0,upper=10,itmax=10)
resstrain
summary(resstrain)
plot(resstrain)


#STOPS with stress
strucpars<-list(list(epsilon=10,minpts=2,scale=3),NULL) 
resstress<-stops(dissm,loss="stress",
structures=c("cclusteredness","cdependence"),
strucpars=strucpars,optimmethod="ALJ",lower=0,upper=10)
resstress
summary(resstress)
plot(resstress)
plot(resstress,"Shepard")

#STOPS with powerstress
respstress<-stops(dissm,loss="powerstress",
structures=c("cclusteredness","cdependence"),
strucpars=strucpars,weightmat=dissm,
itmaxps=1000,optimmethod="ALJ",lower=c(0,0,1),upper=c(10,10,10))
respstress
summary(respstress)
plot(respstress)

#STOPS with bcstress
resbcstress<-stops(dissm,loss="bcstress",
structures=c("cclusteredness","cdependence"),
strucpars=strucpars,optimmethod="ALJ",lower=c(0,1,0),upper=c(10,10,10))
resbcstress
summary(resbcstress)
plot(resbcstress)

#STOPS with lmds
reslmds<-stops(dissm,loss="lmds",
structures=c("cclusteredness","clinearity"),
strucpars=strucpars,optimmethod="ALJ",lower=c(2,0),upper=c(10,2))
reslmds
summary(reslmds)
plot(reslmds)

#STOPS with Isomap (the epsilon version)
resiso<-stops(dissm,loss="isomapeps",
structures=c("cclusteredness","clinearity"),
strucpars=strucpars,optimmethod="ALJ",lower=70,upper=120)
resiso
summary(resiso)
plot(resiso)


data(kinshipdelta,package="smacof")
strucpar<-list(NULL,NULL) #parameters for indices
res1<-stops(kinshipdelta,loss="stress",
structures=c("cclumpiness","cassociation"),strucpars=strucpar,
lower=0,upper=10,itmax=10)
res1


data(BankingCrisesDistances)
strucpar<-list(c(epsilon=10,minpts=2),NULL) #parameters for indices
res1<-stops(BankingCrisesDistances[,1:69],loss="stress",verbose=0,
structures=c("cclusteredness","clinearity"),strucpars=strucpar,
lower=0,upper=10)
res1

strucpar<-list(list(alpha=0.6,C=15,var.thr=1e-5,zeta=NULL),
list(alpha=0.6,C=15,var.thr=1e-5,zeta=NULL))
res1<-stops(BankingCrisesDistances[,1:69],loss="stress",verbose=0,
structures=c("cfunctionality","ccomplexity"),strucpars=strucpar,
lower=0,upper=10)
res1



[Package stops version 1.0-1 Index]