gower_dist {gower} | R Documentation |
Gower's distance
Description
Compute Gower's distance, pairwise between records in two data sets x
and y
. Records from the smallest data set are recycled over.
Usage
gower_dist(
x,
y,
pair_x = NULL,
pair_y = NULL,
eps = 1e-08,
weights = NULL,
ignore_case = FALSE,
nthread = getOption("gd_num_thread")
)
Arguments
x |
|
y |
|
pair_x |
|
pair_y |
|
eps |
|
weights |
|
ignore_case |
|
nthread |
Number of threads to use for parallelization. By default,
for a dual-core machine, 2 threads are used. For any other machine
n-1 cores are used so your machine doesn't freeze during a big computation.
The maximum nr of threads are determined using |
Value
A numeric
vector of length max(nrow(x),nrow(y))
.
When there are no columns to compare, a message is printed and both
numeric(0)
is returned invisibly.
Details
There are three ways to specify which columns of x
should be compared
with what columns of y
. The first option is do give no specification.
In that case columns with matching names will be used. The second option
is to use only the pairs_y
argument, specifying for each column in x
in order, which column in y
must be used to pair it with (use 0
to skip a column in x
). The third option is to explicitly specify the
columns to be matched using pair_x
and pair_y
.
Note
Gower (1971) originally defined a similarity measure (s
, say)
with values ranging from 0 (completely dissimilar) to 1 (completely similar).
The distance returned here equals 1-s
.
References
Gower, John C. "A general coefficient of similarity and some of its properties." Biometrics (1971): 857-871.