kmeansbaryeps {ttbary} | R Documentation |
Compute Pseudo-Barycenter of a List of Point Patterns (with epsilon-relaxation)
Description
Starting from an initial candidate point pattern zeta
, use a k-means-like
algorithm to compute a local minimum in the barycenter problem based on the TT-2 metric
for a list pplist
of planar point patterns.
Usage
kmeansbaryeps(
epsvec,
zeta,
pplist,
penalty,
add_del = Inf,
surplus = 0,
relaxVec = c(20, 1, 1, 1),
N = 200L,
eps = 0.005,
exact = FALSE,
verbose = 0
)
Arguments
epsvec |
a vector containing the values for the relaxed assignment. Last entry should be < 1/n, where n is the largest cardinality among the point patterns. Otherwise the algorithm has no guarantee of terminating in a local minimum! If epsvec[1] is too small, the computational load may be large. If in doubt, choose c(10^8,10^7,10^6,...,10/(n+1),1/(n+1)). |
zeta |
a point pattern. Object of class |
pplist |
a list of point patterns. Object of class |
penalty |
the penalty for adding/deleting points when computing TT-2 distances. |
add_del |
for how many iterations shall the algorithm add points to / delete points from zeta if this is favorable? Defaults to Inf. |
surplus |
By how many points is the barycenter point pattern allowed to be larger than the largest input point pattern (among pplist and zeta) if add_del > 0. A larger number increases the computational load. |
relaxVec |
a vector of four integers controlling the epsilon-relaxation of the assignments. See details below. |
N |
the maximum number of iterations. |
eps |
the algorithm stops if the relative improvement of the objective function between two iterations is less than eps. |
exact |
logical. Shall the barycenter of a cluster be calculated exactly by Algorithm 1
of Drezner, Mehrez and Wesolowsky (1991)? In our experience setting |
verbose |
the verbosity level. One of 0, 1, 2, 3, where 0 means silent and 3 means full details. |
Details
Given k
planar point patterns \xi_1, \ldots, \xi_k
(stored in
pplist
), this function finds a local minimizer \zeta^*
of
\sum_{j=1}^k \tau_2(\xi_j, \zeta)^2,
where \tau_2
denotes the TT-2 metric based on the Euclidean metric between points.
Starting from an initial candidate point pattern zeta
, the algorithm alternates
between assigning a point from each pattern \xi_j
to each point of the candidate
and computing new candidate patterns by shifting, adding and deleting points.
A detailed description of the algorithm is given in Müller, Schuhmacher and Mateu (2020).
For first-time users it is recommended to keep the default values and set penalty
to a noticeable fraction of the diameter of the observation window, e.g. between
0.1 and 0.25 times this diameter.
The argument relaxVec
must be a vector of four integers c(a,b,c,d) > c(0,0,0,0).
For the first a
iterations step by step one entry of epsvec
is additionally considered in the assignment, starting with
only the first entry in the first iteration. In this a
iterations the algorithm can stop if it has improved by less than eps
between iterations.
After a
iterations all entries of epsvec
before epsvec[b]
are ignored and everytime
the algorithm does not improve, the next d
entries of epsvec are additionally considered in the following iterations. When the last
entry of epsvec
is considered in the assignments, the entries of epsvec before epsvec[c]
are ignored.
relaxVec
defaults to c(20,1,1,1) meaning that in every one of the first 20 iterations one additional entry of epsvec
is considered until the algorithm converges. This allows the algorithm to converge before the full epsvec was considered! For further
details see example.
Warning: The argument relaxVec
offers many different options for controlling the epsilon-relaxation of the assignments
in order to save computation time. But choosing bad parameters may heavily increase the computational load!
If in doubt, go with c(length(epsvec),1,1,1) (see examples).
Value
A list with components:
cost |
the sum of squared TT-2 distances between the computed pseudo-barycenter and the point patterns. |
barycenter |
the pseudo-barycenter as a |
iterations |
the number of iterations required until convergence. |
Author(s)
Raoul Müller raoul.mueller@uni-goettingen.de
Dominic Schuhmacher schuhmacher@math.uni-goettingen.de
References
Raoul Müller, Dominic Schuhmacher and Jorge Mateu (2020).
Metrics and Barycenters for Point Pattern Data.
Statistics and Computing 30, 953-972.
doi:10.1007/s11222-020-09932-y
See Also
kmeansbary
for a similar function that works without epsilon relaxation
Examples
data(pplist_samecard)
plot(superimpose(pplist_samecard), cex=0.7, legend=FALSE,
xlim=c(-0.2,1.2), ylim=c(-0.1,1.1), main="", use.marks=FALSE) #plotting the data
set.seed(12345)
zeta <- ppp(runif(100),runif(100))
plot(zeta, add=TRUE, col="beige", lwd=2, pch=16) #plotting the start-zeta over the data
epsvec <- c(1e8,1e7,1e6,1e5,1e4,1e3,1e2,10,1,10/101,1/101)
relaxVec1 <- c(length(epsvec),1,1,1)
#One additional entry of epsvec is considered in each iteration;
#algorithm can stop before full epsvec was used.
#Runs fast with little to no drawback in the quality of the computed solution.
#Time advantage more visible for large datasets.
relaxVec2 <- c(1,1,1,length(epsvec))
#In the first iteration only epsvec[1] is used, after that every assignment is exact.
#Not as fast as the previous version but usually no drawbacks at all in the computed solution.
#Time advantage more visible for large datasets.
relaxVec3 <- c(3,2,3,2)
#in the first 3 iterations epsvec[1],epsvec[1:2],epsvec[1:3] are used in the assignments,
#after that epsvec[2:x] is used, where x starts at 3 (=maximum(a,b)) and increases
#by 2 everytime the algorithm does not improve. When x >= length(epsvec) all assignments
#are done with epsvec[3:length(epsvec)].
res1 <- kmeansbaryeps(epsvec, zeta, pplist_samecard, penalty=0.1, add_del=5, relaxVec = relaxVec1)
res2 <- kmeansbaryeps(epsvec, zeta, pplist_samecard, penalty=0.1, add_del=5, relaxVec = relaxVec2)
res3 <- kmeansbaryeps(epsvec, zeta, pplist_samecard, penalty=0.1, add_del=5, relaxVec = relaxVec3)
plot(res1$barycenter, add=TRUE, col="blue", pch=16) #adding the computed barycenter in blue