Distance between vectors and a matrix - Sum of all pairwise distances in a distance matrix. {Rfast} | R Documentation |
Distance between vectors and a matrix - Sum of all pairwise distances in a distance matrix.
Description
Distance between vectors and a matrix - Sum of all pairwise distances in a distance matrix..
Usage
dista(xnew, x, type = "euclidean", k = 0, index = FALSE,
trans = TRUE, square = FALSE, p = 0, parallel = FALSE)
total.dista(xnew, x, type = "euclidean", k = 0,
square = FALSE, p = 0, parallel = FALSE)
Arguments
xnew |
A matrix with some data or a vector. |
x |
A matrix with the data, where rows denotes observations (vectors) and the columns contain the variables. |
type |
This can be either "euclidean" or "manhattan". |
k |
Should the k smaller distances or their indices be returned? If k > 0 this will happen. |
index |
In case k is greater than 0, you have the option to get the indices of the k smallest distances. |
trans |
Do you want the returned matrix to be transposed? TRUE or FALSE. |
square |
If you choose "euclidean" or "hellinger" as the method, then you can have the option to return the squared Euclidean distances by setting this argument to TRUE. |
p |
This is for the the Minkowski, the power of the metric. |
parallel |
For methods kullback_leibler, jensen_shannon and itakura_saito, you can run the algorithm in parallel. |
Details
The target of this function is to calculate the distances between xnew and x without having to calculate the whole distance matrix of xnew and x. The latter does extra calculations, which can be avoided.
euclidean :
\sum \sqrt( \sum | P_i - Q_i |^2)
manhattan :
\sum \sum | P_i - Q_i |
minimum :
\sum \min | P_i - Q_i |
maximum :
\sum \max | P_i - Q_i |
minkowski :
\sum ( \sum | P_i - Q_i |^p)^(1/p)
bhattacharyya :
\sum - ln \sum \sqrt(P_i * Q_i)
hellinger :
\sum 2 * \sqrt( 1 - \sum \sqrt(P_i * Q_i))
kullback_leibler :
\sum \sum P_i * log(P_i / Q_i)
jensen_shannon :
\sum 0.5 * ( \sum P_i * log(2 * P_i / P_i + Q_i) + \sum Q_i * log(2 * Q_i / P_i + Q_i))
canberra :
\sum \sum | P_i - Q_i | / (P_i + Q_i)
chi_square
X
^2 :\sum \sum ( (P_i - Q_i )^2 / (P_i + Q_i) )
soergel :
\sum \sum | P_i - Q_i | / \sum \max(P_i , Q_i)
sorensen :
\sum \sum | P_i - Q_i | / \sum (P_i + Q_i)
cosine :
\sum (P_i * Q_i) / \sqrt(\sum P_i^2) * \sqrt(\sum Q_i^2)
wave_hedges :
\sum \sum | P_i - Q_i | / \max(P_i , Q_i)
motyka :
\sum \sum \min(P_i , Q_i) / (P_i + Q_i)
harmonic_mean :
2 * \sum (P_i * Q_i) / (P_i + Q_i)
jeffries_matusita :
\sum \sqrt( 2 - 2 * \sum \sqrt(P_i * Q_i))
gower :
\sum 1/d * \sum | P_i - Q_i |
kulczynski :
\sum 1 / \sum | P_i - Q_i | / \sum \min(P_i , Q_i)
Value
A matrix with the distances of each xnew from each vector of x. The number of rows of the xnew and and the number of columns of xnew are the dimensions of this matrix.
Author(s)
Michail Tsagris.
R implementation and documentation: Michail Tsagris <mtsagris@uoc.gr> and Manos Papadakis <papadakm95@gmail.com>.
See Also
mahala, Dist, total.dist, total.dista
Examples
xnew <- as.matrix( iris[1:10, 1:4] )
x <- as.matrix( iris[-c(1:10), 1:4] )
a <- dista(xnew, x)
b <- as.matrix( dist( rbind(xnew, x) ) )
b <- b[ 1:10, -c(1:10) ]
sum( abs(a - b) )
## see the time
x <- matrix( rnorm(1000 * 4), ncol = 4 )
dista(xnew, x)
as.matrix( dist( rbind(xnew, x) ) )
x<-b<-a<-xnew<-NULL