kmeansbaryeps {ttbary}R Documentation

Compute Pseudo-Barycenter of a List of Point Patterns (with epsilon-relaxation)

Description

Starting from an initial candidate point pattern zeta, use a k-means-like algorithm to compute a local minimum in the barycenter problem based on the TT-2 metric for a list pplist of planar point patterns.

Usage

kmeansbaryeps(
  epsvec,
  zeta,
  pplist,
  penalty,
  add_del = Inf,
  surplus = 0,
  relaxVec = c(20, 1, 1, 1),
  N = 200L,
  eps = 0.005,
  exact = FALSE,
  verbose = 0
)

Arguments

epsvec

a vector containing the values for the relaxed assignment. Last entry should be < 1/n, where n is the largest cardinality among the point patterns. Otherwise the algorithm has no guarantee of terminating in a local minimum! If epsvec[1] is too small, the computational load may be large. If in doubt, choose c(10^8,10^7,10^6,...,10/(n+1),1/(n+1)).

zeta

a point pattern. Object of class ppp or a list with components x and y.

pplist

a list of point patterns. Object of class ppplist or any list where each elements has components x and y.

penalty

the penalty for adding/deleting points when computing TT-2 distances.

add_del

for how many iterations shall the algorithm add points to / delete points from zeta if this is favorable? Defaults to Inf.

surplus

By how many points is the barycenter point pattern allowed to be larger than the largest input point pattern (among pplist and zeta) if add_del > 0. A larger number increases the computational load.

relaxVec

a vector of four integers controlling the epsilon-relaxation of the assignments. See details below.

N

the maximum number of iterations.

eps

the algorithm stops if the relative improvement of the objective function between two iterations is less than eps.

exact

logical. Shall the barycenter of a cluster be calculated exactly by Algorithm 1 of Drezner, Mehrez and Wesolowsky (1991)? In our experience setting exact=TRUE yields no systematic improvement of the overall objective function value, while the computation times are substantially larger.

verbose

the verbosity level. One of 0, 1, 2, 3, where 0 means silent and 3 means full details.

Details

Given k planar point patterns \xi_1, \ldots, \xi_k (stored in pplist), this function finds a local minimizer \zeta^* of

\sum_{j=1}^k \tau_2(\xi_j, \zeta)^2,

where \tau_2 denotes the TT-2 metric based on the Euclidean metric between points.

Starting from an initial candidate point pattern zeta, the algorithm alternates between assigning a point from each pattern \xi_j to each point of the candidate and computing new candidate patterns by shifting, adding and deleting points. A detailed description of the algorithm is given in Müller, Schuhmacher and Mateu (2020).

For first-time users it is recommended to keep the default values and set penalty to a noticeable fraction of the diameter of the observation window, e.g. between 0.1 and 0.25 times this diameter.

The argument relaxVec must be a vector of four integers c(a,b,c,d) > c(0,0,0,0). For the first a iterations step by step one entry of epsvec is additionally considered in the assignment, starting with only the first entry in the first iteration. In this a iterations the algorithm can stop if it has improved by less than eps between iterations. After a iterations all entries of epsvec before epsvec[b] are ignored and everytime the algorithm does not improve, the next d entries of epsvec are additionally considered in the following iterations. When the last entry of epsvec is considered in the assignments, the entries of epsvec before epsvec[c] are ignored. relaxVec defaults to c(20,1,1,1) meaning that in every one of the first 20 iterations one additional entry of epsvec is considered until the algorithm converges. This allows the algorithm to converge before the full epsvec was considered! For further details see example.

Warning: The argument relaxVec offers many different options for controlling the epsilon-relaxation of the assignments in order to save computation time. But choosing bad parameters may heavily increase the computational load! If in doubt, go with c(length(epsvec),1,1,1) (see examples).

Value

A list with components:

cost

the sum of squared TT-2 distances between the computed pseudo-barycenter and the point patterns.

barycenter

the pseudo-barycenter as a ppp-object.

iterations

the number of iterations required until convergence.

Author(s)

Raoul Müller raoul.mueller@uni-goettingen.de
Dominic Schuhmacher schuhmacher@math.uni-goettingen.de

References

Raoul Müller, Dominic Schuhmacher and Jorge Mateu (2020).
Metrics and Barycenters for Point Pattern Data.
Statistics and Computing 30, 953-972.
doi:10.1007/s11222-020-09932-y

See Also

kmeansbary for a similar function that works without epsilon relaxation

Examples

data(pplist_samecard)
plot(superimpose(pplist_samecard), cex=0.7, legend=FALSE,
     xlim=c(-0.2,1.2), ylim=c(-0.1,1.1), main="", use.marks=FALSE) #plotting the data

set.seed(12345)
zeta <- ppp(runif(100),runif(100))
plot(zeta, add=TRUE, col="beige", lwd=2, pch=16) #plotting the start-zeta over the data

epsvec <- c(1e8,1e7,1e6,1e5,1e4,1e3,1e2,10,1,10/101,1/101)

relaxVec1 <- c(length(epsvec),1,1,1) 
#One additional entry of epsvec is considered in each iteration;
#algorithm can stop before full epsvec was used.
#Runs fast with little to no drawback in the quality of the computed solution.
#Time advantage more visible for large datasets.

relaxVec2 <- c(1,1,1,length(epsvec))
#In the first iteration only epsvec[1] is used, after that every assignment is exact.
#Not as fast as the previous version but usually no drawbacks at all in the computed solution.
#Time advantage more visible for large datasets.

relaxVec3 <- c(3,2,3,2)
#in the first 3 iterations epsvec[1],epsvec[1:2],epsvec[1:3] are used in the assignments,
#after that epsvec[2:x] is used, where x starts at 3 (=maximum(a,b)) and increases
#by 2 everytime the algorithm does not improve. When x >= length(epsvec) all assignments
#are done with epsvec[3:length(epsvec)].

res1 <- kmeansbaryeps(epsvec, zeta, pplist_samecard, penalty=0.1, add_del=5, relaxVec = relaxVec1)
res2 <- kmeansbaryeps(epsvec, zeta, pplist_samecard, penalty=0.1, add_del=5, relaxVec = relaxVec2)
res3 <- kmeansbaryeps(epsvec, zeta, pplist_samecard, penalty=0.1, add_del=5, relaxVec = relaxVec3)
plot(res1$barycenter, add=TRUE, col="blue", pch=16) #adding the computed barycenter in blue


[Package ttbary version 0.3-1 Index]