| gower_dist {gower} | R Documentation |
Gower's distance
Description
Compute Gower's distance, pairwise between records in two data sets x
and y. Records from the smallest data set are recycled over.
Usage
gower_dist(
x,
y,
pair_x = NULL,
pair_y = NULL,
eps = 1e-08,
weights = NULL,
ignore_case = FALSE,
nthread = getOption("gd_num_thread")
)
Arguments
x |
|
y |
|
pair_x |
|
pair_y |
|
eps |
|
weights |
|
ignore_case |
|
nthread |
Number of threads to use for parallelization. By default,
for a dual-core machine, 2 threads are used. For any other machine
n-1 cores are used so your machine doesn't freeze during a big computation.
The maximum nr of threads are determined using |
Value
A numeric vector of length max(nrow(x),nrow(y)).
When there are no columns to compare, a message is printed and both
numeric(0) is returned invisibly.
Details
There are three ways to specify which columns of x should be compared
with what columns of y. The first option is do give no specification.
In that case columns with matching names will be used. The second option
is to use only the pairs_y argument, specifying for each column in x
in order, which column in y must be used to pair it with (use 0
to skip a column in x). The third option is to explicitly specify the
columns to be matched using pair_x and pair_y.
Note
Gower (1971) originally defined a similarity measure (s, say)
with values ranging from 0 (completely dissimilar) to 1 (completely similar).
The distance returned here equals 1-s.
References
Gower, John C. "A general coefficient of similarity and some of its properties." Biometrics (1971): 857-871.