dist_binary {representr}R Documentation

The distance between two records

Description

The distance between two records

Usage

dist_binary(a, b)

dist_col_type_slow(
  a,
  b,
  col_type,
  string_dist = utils::adist,
  weights = rep(1/length(a), length(a)),
  orders = NULL,
  ...
)

Arguments

a

Record a

b

Record b

col_type

A vector encoding the column type for each column in the dataset. Can take values in "categorical", "ordinal", "string", or "numeric"

string_dist

String distance function. Default is edit distance. Function must take at least two arguments (strings)

weights

A vector of weights for each column for making some column distances more important. Must sum to 1. Defaults to equal weights.

orders

A named list containing the order of the levels in each ordinal column. Defaults to NULL, which corresponds to no ordinal variables.

...

Additional parameters passed to string distance function.

Value

binary_dist returns a numeric value indicating how many discrepancies there are between two records.

col_type_dist return a numeric value of the weighted column type specific distance between two records.

Examples

data("rl_reg1")
dist_binary(rl_reg1[1,], rl_reg1[2,])

type <- c("string", "string", "numeric", "numeric",
    "numeric", "categorical", "ordinal", "numeric", "numeric")
order <- list(education = c("Less than a high school diploma",
    "High school graduates, no college", "Some college or associate degree",
    "Bachelor's degree only", "Advanced degree"))

dist_col_type_slow(rl_reg1[1,], rl_reg1[2,], col_type = type, order = order)


[Package representr version 0.1.5 Index]