distSparse {qlcMatrix} | R Documentation |
Sparse distance matrix calculations
Description
Sparse alternative to base dist
function. WARNING: the result is not a distance metric, see details! Also: distances are calculated between columns (not between rows, as in the base dist
function).
Usage
distSparse(M, method = "euclidean", diag = FALSE)
Arguments
M |
a sparse matrix in a format of the |
method |
method to calculate distances. Currently only |
diag |
should the diagonal be included in the results? |
Details
A sparse distance matrix is a slightly awkward concept, because distances of zero are rare in most data. Further, it is mostly the small distances that are of interest, and not the large distanes (which are mostly also less trustwhorthy). Note that for random data, this assumption is not necessarily true.
To obtain sparse results, the current implementation takes a special approach. First, only those distances will be calculated for which there is at least some non-zero data for both columns. The assumption is taken that those distances will be uninteresting (and relatively large anyway).
Second, to differentiate the non-calculated distances from real zero distances, the distances are converted into similarities by substracting them from the maximum. In this way, all non-calculated distances are zero, and the real zeros have value max(M)
.
Euclidean distances are calculated using the following trick:
colSums(M^2) + rowSums(M^2) - 2 * M'M
Value
A symmetric matrix of type dsCMatrix
, consisting of similarity(!) values instead of distances (viz. max(dist)-dist
).
Note
Please note:
The values in the result are not distances, but similarities computed as
max(dist)-dist
.Non-calculated values are zero.
Author(s)
Michael Cysouw <cysouw@mac.com
See Also
See Also as dist
.
Examples
# to be done