R: Fitting a COPS-C Model (COPS Variant 1).

copstressMin {cops}

R Documentation

Fitting a COPS-C Model (COPS Variant 1).

Description

Minimizing Copstress to obtain a clustered ratio, interval or ordinal PS configuration with given explicit power transformations theta. The function allows mix-and-match of explicit (via theta) and implicit (via type) transformations by setting the kappa, lambda, nu (or theta) and type arguments.

Usage

copstressMin(
  delta,
  kappa = 1,
  lambda = 1,
  nu = 1,
  theta = c(kappa, lambda, nu),
  type = c("ratio", "interval", "ordinal"),
  ties = "primary",
  weightmat = 1 - diag(nrow(delta)),
  ndim = 2,
  init = NULL,
  stressweight = 0.975,
  cordweight = 0.025,
  q = 1,
  minpts = ndim + 1,
  epsilon = max(10, max(delta)),
  dmax = NULL,
  rang,
  optimmethod = c("NelderMead", "Newuoa", "BFGS", "SANN", "hjk", "solnl", "solnp",
    "subplex", "snomadr", "hjk-Newuoa", "hjk-BFGS", "BFGS-hjk", "Newuoa-hjk", "cmaes",
    "direct", "direct-Newuoa", "direct-BFGS", "genoud", "gensa"),
  verbose = 0,
  scale = c("sd", "rmsq", "proc", "none"),
  normed = TRUE,
  accuracy = 1e-07,
  itmax = 10000,
  stresstype = c("stress-1", "stress"),
  principal = FALSE,
  ...
)

copsc(
  delta,
  kappa = 1,
  lambda = 1,
  nu = 1,
  theta = c(kappa, lambda, nu),
  type = c("ratio", "interval", "ordinal"),
  ties = "primary",
  weightmat = 1 - diag(nrow(delta)),
  ndim = 2,
  init = NULL,
  stressweight = 0.975,
  cordweight = 0.025,
  q = 1,
  minpts = ndim + 1,
  epsilon = max(10, max(delta)),
  dmax = NULL,
  rang,
  optimmethod = c("NelderMead", "Newuoa", "BFGS", "SANN", "hjk", "solnl", "solnp",
    "subplex", "snomadr", "hjk-Newuoa", "hjk-BFGS", "BFGS-hjk", "Newuoa-hjk", "cmaes",
    "direct", "direct-Newuoa", "direct-BFGS", "genoud", "gensa"),
  verbose = 0,
  scale = c("sd", "rmsq", "proc", "none"),
  normed = TRUE,
  accuracy = 1e-07,
  itmax = 10000,
  stresstype = c("stress-1", "stress"),
  principal = FALSE,
  ...
)

copStressMin(
  delta,
  kappa = 1,
  lambda = 1,
  nu = 1,
  theta = c(kappa, lambda, nu),
  type = c("ratio", "interval", "ordinal"),
  ties = "primary",
  weightmat = 1 - diag(nrow(delta)),
  ndim = 2,
  init = NULL,
  stressweight = 0.975,
  cordweight = 0.025,
  q = 1,
  minpts = ndim + 1,
  epsilon = max(10, max(delta)),
  dmax = NULL,
  rang,
  optimmethod = c("NelderMead", "Newuoa", "BFGS", "SANN", "hjk", "solnl", "solnp",
    "subplex", "snomadr", "hjk-Newuoa", "hjk-BFGS", "BFGS-hjk", "Newuoa-hjk", "cmaes",
    "direct", "direct-Newuoa", "direct-BFGS", "genoud", "gensa"),
  verbose = 0,
  scale = c("sd", "rmsq", "proc", "none"),
  normed = TRUE,
  accuracy = 1e-07,
  itmax = 10000,
  stresstype = c("stress-1", "stress"),
  principal = FALSE,
  ...
)

Arguments

`delta`	numeric matrix or dist object of a matrix of proximities
`kappa`	power transformation for fitted distances
`lambda`	power transformation for proximities (only used if type="ratio" or "interval")
`nu`	power transformation for weights
`theta`	the theta vector of powers; the first is kappa (for the fitted distances if it exists), the second lambda (for the observed proximities if it exist and type="ratio" or "interval"), the third is nu (for the weights if it exists). If less than three elements are is given as argument, it will be recycled. Defaults to 1 1 1. Will override any kappa, lambda, nu parameters if they are given and do not match.
`type`	what type of MDS to fit. Currently one of "ratio", "interval" or "ordinal". Default is "ratio".
`ties`	the handling of ties for ordinal (nonmetric) MDS. Possible are "primary" (default), "secondary" or "tertiary".
`weightmat`	(optional) a matrix of nonnegative weights; defaults to 1 for all off diagonals
`ndim`	number of dimensions of the target space
`init`	(optional) initial configuration
`stressweight`	weight to be used for the fit measure; defaults to 0.975
`cordweight`	weight to be used for the cordillera; defaults to 0.025
`q`	the norm of the cordillera; defaults to 1
`minpts`	the minimum points to make up a cluster in OPTICS, see `optics` where it is called `minPts`; defaults to ndim+1.
`epsilon`	the epsilon parameter of OPTICS, the neighbourhood that is checked, see `optics`; defaults to 10 (which is plenty for the explicit normalization we use). Note this means we do not expect any noise objects per default. This number will rarely be exceeded if we standardize the configuration as is the default in cops. However if no standardization is applied or there is a procrustes adjustment to a configuration with variance of 10 or more on any of the axes, it can have the effect of being too small. In that case just set a much higher epsilon.
`dmax`	The winsorization limit of reachability distances in the OPTICS Cordillera. If supplied, it should be either a numeric value that matches 'max(rang)' or 'NULL'; if 'NULL' it is found as 1.5 times (for kappa >1) or 1 times (for kappa <=1) the maximum reachbility value of the power torgerson model with the same lambda. If 'dmax' and 'rang' are supplied and 'dmax' is not 'max(rang)', a warning is given and 'rang' takes precedence.
`rang`	range of the reachabilities to be considered. If missing it is found from the initial configuration by taking 0 as the lower boundary and dmax (see above) as upper boundary. See also `cordillera`
`optimmethod`	What optimizer to use? Choose one string of 'Newuoa' (`newuoa`), 'NelderMead' (see `optim`), 'hjk' (Hooke-Jeeves algorithm from `hjk`), 'solnl' (from `solnl`), 'solnp' (from `solnp`), 'subplex' (from `subplex`), 'SANN' (simulated annealing, `optim`), 'BFGS' (see `optim`), 'snomadr' (from `snomadr`), 'genoud' (from `genoud`), 'gensa' (from `GenSA`), 'cmaes' (from `cma_es`) and 'direct' (from `direct`). See the linked functions for details on these solvers. There are also combinations that proved to work well good, like 'hjk-Newuoa', 'hjk-BFGS', 'BFGS-hjk', 'Newuoa-hjk', 'direct-Newuoa' and 'direct-BFGS'. Usually everything with 'hjk', 'BFGS', 'Newuoa', 'subplex' and 'solnl' in it work rather well in an acceptable time frame (depending on the smoothness of copstress). Default is 'hjk-Newuoa'.
`verbose`	numeric value hat prints information on the fitting process; >2 is very verbose
`scale`	Scale the configuration (in MDS stress is invariant up to a scaling factor). One of "none" (so no extra scaling of the configuration but normalized to sum delta^2=1), "sd" (configuration divided by the highest standard deviation of any the columns), "proc" (procrustes adjustment to the initial fit) and "rmsq" (configuration divided by the maximum root mean square of the columns). Default is "sd" which often gives a nicer spread on the axes. Note that the scaled configuration is returned as $conf and the unscaled as $usconf, so manual calculation of the OC should be done with $conf.
`normed`	should the Cordillera be normed; defaults to TRUE.
`accuracy`	numerical accuracy, defaults to 1e-7.
`itmax`	maximum number of iterations. Defaults to 10000. For the two-step algorithms if itmax is exceeded by the first solver, the second algorithm is run for at least 0.1*itmax (so overall itmax may be exceeded by a factor of 1.1).
`stresstype`	which stress to use in the copstress. Defaults to stress-1. If anything else is set, explicitly normed stress which is (stress-1)^2 is used. Using stress-1 puts more weight on MDS fit.
`principal`	If ‘TRUE’, principal axis transformation is applied to the final configuration.
`...`	additional arguments to be passed to the optimization procedure

Details

This is an extremely flexible approach to least squares proximity scaling: It supports ratio power stress; ratio, interval and ordinal r stress and ratio, interval and ordinal MDS with or without a COPS penalty. Famous special cases of these models that can be fitted are multiscale MDS if kappa->0 and delta=log(delta), Alscal MDS (sstress) with lambda=kappa=2, sammon type mapping with weightmat=delta and nu=-1, elastic scaling with weightmat=delta and nu=-2. Due to mix-and-match this function also allows to fit models that have not yet been published, such as for example an "elastic scaling ordinal s-stress with cops penalty".

If one wants to fit these models without the cops penalty, we recommend to use powerStressMin (for ratio and interval MDS with any power transformation for weights, dissimilarities and distances) or rStressMin (for ratio, interval and ordinal MDS with power transformations for distances and weights) as these use majorization.

Some optimizers (including the default hjk-Newuoa) will print a warning if itmax is (too) small or if there was no convergence. Consider increasing itmax then.

For some solvers theresometimes may be an error [NA/NaN/Inf in foreign function call (arg 3)] stemming from smacof::transform(). This happens when the algorithm places two object at exactly the same place so their fitted distance is 0. This is good from an OPTICS Cordillera point of view (as it is more clustered) which is why some solvers like to pick that up, but it can lead to an issue in the optimal scaling in smacof. This can usually be mitigated when specifying the model by either using less cordweight, less itmax, less accuracy or combining the two offending objects into one (so include them as a combined row in the distance matrix).

We might eventually switch to newuoa in nloptr.

Value

A copsc object (inheriting from smacofP). A list with the components

delta: the original untransformed dissimilarities
tdelta: the explicitly transformed dissimilarities
dhat: the explicitly transformed dissimilarities (dhats), optimally scaled and normalized (which are approximated by the fit)
confdist: Configuration distances, the fitted distances
conf: the configuration (normed) and scaled as specified in scale.
usconf: the unscaled configuration (normed to sum delta^2=1). Scaling applied to usconf gives conf.
parameters, par, pars : the theta vector of powers tranformations (kappa, lambda, nu)
niter: number of iterations of the optimizer.
stress: the square root of explicitly normalized stress (calculated for confo).
spp: stress per point
ndim: number of dimensions
model: Fitted model name
call: the call
nobj: the number of objects
type, loss, losstype: stresstype
stress.m: The stress used for copstress. If stresstype="stress-1" this is like $stress else it is stress^2
copstress: the copstress loss value
resmat: the matrix of residuals
weightmat: the matrix of untransformed weights
tweightmat: the transformed weighting matrix (here weightmat^nu)
OC: the (normed) OPTICS Cordillera object (calculated for scaled conf)
OCv: the (normed) OPTICS Cordillera value alone (calculated for scaled conf)
optim: the object returned from the optimization procedure
stressweight, cordweight: the weights of the stress and OC respectively (v_1 and v_2)
optimmethod: The solver used
type: the type of MDS fitted

Examples

dis<-as.matrix(smacof::kinshipdelta)

set.seed(1)
## Copstress with equal weight to stress and cordillera 
res1<-copstressMin(dis,stressweight=0.5,cordweight=0.5,
                  itmax=100) #use higher itmax about 10000 
res1
summary(res1)
plot(res1)  #super clustered 

##Alias name 
res1<-copsc(dis,stressweight=0.5,
                  cordweight=0.5,itmax=100) 


## Elastic scaling ordinal s-stress with cops penalty
res1<-copsc(dis,type="ordinal",kappa=2,nu=-2,weightmat=dis,
            stressweight=0.5, cordweight=0.5,itmax=100)

[Package cops version 1.11-3 Index]