matchFeat-package {matchFeat}R Documentation

One-to-One Feature Matching

Description

Statistical methods to match feature vectors between multiple datasets in a one-to-one fashion. Given a fixed number of classes/distributions, for each unit, exactly one vector of each class is observed without label. The goal is to label the feature vectors using each label exactly once so to produce the best match across datasets, e.g. by minimizing the variability within classes. Statistical solutions based on empirical loss functions and probabilistic modeling are provided. The 'Gurobi' software and its 'R' interface package are required for one of the package functions (match.2x()) and can be obtained at <https://www.gurobi.com/> (free academic license). For more details, refer to Degras (2022) <doi:10.1080/10618600.2022.2074429> "Scalable feature matching for large data collections" and Bandelt, Maas, and Spieksma (2004) <doi:10.1057/palgrave.jors.2601723> "Local search heuristics for multi-index assignment problems with decomposable costs".

Details

This package serves to match feature vectors across a collection of datasets in a one-to-one fashion. This task is formulated as a multidimensional assignment problem with decomposable costs (MDADC). We propose fast algorithms with time complexity roughly linear in the number n of datasets and space complexity a small fraction of the data size.

Author(s)

Author: David Degras
Maintainer: David Degras <david.degras@umb.edu>

References

Degras (2022) "Scalable feature matching across large data collections." doi:10.1080/10618600.2022.2074429
Wright (2015). Coordinate descent algorithms. https://arxiv.org/abs/1502.04759
McLachlan and Krishnan (2008). The EM Algorithm and Extensions


[Package matchFeat version 1.0 Index]