dist_binary {representr} | R Documentation |
The distance between two records
Description
The distance between two records
Usage
dist_binary(a, b)
dist_col_type_slow(
a,
b,
col_type,
string_dist = utils::adist,
weights = rep(1/length(a), length(a)),
orders = NULL,
...
)
Arguments
a |
Record a |
b |
Record b |
col_type |
A vector encoding the column type for each column in the dataset. Can take values in "categorical", "ordinal", "string", or "numeric" |
string_dist |
String distance function. Default is edit distance. Function must take at least two arguments (strings) |
weights |
A vector of weights for each column for making some column distances more important. Must sum to 1. Defaults to equal weights. |
orders |
A named list containing the order of the levels in each ordinal column. Defaults to NULL, which corresponds to no ordinal variables. |
... |
Additional parameters passed to string distance function. |
Value
binary_dist
returns a numeric value indicating how many discrepancies there are between two records.
col_type_dist
return a numeric value of the weighted column type specific distance between two records.
Examples
data("rl_reg1")
dist_binary(rl_reg1[1,], rl_reg1[2,])
type <- c("string", "string", "numeric", "numeric",
"numeric", "categorical", "ordinal", "numeric", "numeric")
order <- list(education = c("Less than a high school diploma",
"High school graduates, no college", "Some college or associate degree",
"Bachelor's degree only", "Advanced degree"))
dist_col_type_slow(rl_reg1[1,], rl_reg1[2,], col_type = type, order = order)