match.template {matchFeat} | R Documentation |
Template Matching
Description
This function solves the multidimensional assignment problem with decomposable costs (MDADC) by matching the data to a pre-specified set of vectors (the template). The dissimilarity function is the squared Euclidean distance.
Usage
match.template(x, template = 1L, unit = NULL, w = NULL,
method = c("hungarian","bruteforce"), equal.variance = FALSE)
Arguments
x |
data: matrix of dimensions |
template |
integer (= which sample unit to take as template) or |
unit |
integer (=number of units/datasets) or vector mapping rows of |
w |
weights for the loss function. Can be specified as a |
method |
method for the linear assignment problem: |
equal.variance |
logical; if TRUE, resp. FALSE, return common, resp. label-specific, covariance of matched features |
Details
Given n
datasets or statistical units, each containing m
feature vectors, the one-to-one matching problem is to find a set of n
label permutations that produce the best match of feature vectors across units. The objective function to minimize is the sum of squared (Euclidean) distances between all feature vectors having the same (new) label. This amounts to minimizing the sum of the within-label variances.
The template-based method consists in relabeling successively each sample unit to best match a template matrix of feature vectors. This method is very fast but its optimization performance is only as good as the template. For best results, the template should be representative of the collected data.
If x
is a matrix, the rows should be sorted by increasing unit label and unit
should be a nondecreasing sequence of integers, for example (1,...,1,2,...,2,...,n,...,n)
with each integer 1,...,n
replicated m
times.
The argument w
can be specified as a vector of positive numbers (will be recycled to length p
if needed) or as a positive definite matrix of size (p,p)
.
Value
A list of class matchFeat
with fields
sigma
best assignement as set of permutations (
m\times n
matrix)cluster
best assignement as cluster indicators (
m \times n
matrix)objective
minimum objective value
mu
mean vector for each class/label (
p \times m
matrix)V
covariance matrix for each class/label (
p \times p \times m
array ifequal.variance
is FALSE,p \times p
matrix otherwisecall
function call
References
Degras (2022) "Scalable feature matching across large data collections."
doi:10.1080/10618600.2022.2074429
https://en.wikipedia.org/wiki/Assignment_problem
https://en.wikipedia.org/wiki/Hungarian_algorithm
See Also
match.2x
, match.bca
,
match.bca.gen
, match.gaussmix
, match.kmeans
, match.rec
Examples
## Generate data
n <- 10
k <- 3
d <- 5
mu <- matrix(1:k, nrow=d, ncol=k, byrow=TRUE)
sigma <- 0.3
x <- array(mu, c(d,k,n)) + rnorm(d*k*n,sigma)
## Match all feature vectors with first case as template
result <- match.template(x,1)
## Display results
result$cost # cost function
xmatched <- array(dim=dim(x))
# re-arranged (matched) feature vectors
for (i in 1:n)
xmatched[,,i] <- x[,result$sigma[,i],i]