edit_dist_string {lingdist}R Documentation

Compute edit distance between two strings

Description

Compute edit distance between two strings and get all possible alignment scenarios. Custom cost matrix is supported. Symbols separated by custom delimiters are supported.

Usage

edit_dist_string(
  str1,
  str2,
  cost_mat = NULL,
  delim = "",
  return_alignments = FALSE
)

Arguments

str1

String to be compared.

str2

String to be compared.

cost_mat

Dataframe in squareform indicating the cost values when one symbol is deleted, inserted or substituted by another. Rownames and colnames are symbols. 'cost_mat[char1,"_NULL_"]' indicates the cost value of deleting char1 and 'cost_mat["_NULL_",char1]' is the cost value of inserting it. When an operation is not defined in the cost_mat, it is set 0 when the two symbols are the same, otherwise 1.

delim

The delimiter in 'str1' and 'str2' separating atomic symbols.

return_alignments

Whether to return alignment details

Value

A list contains 'distance' attribution storing the distance result. If 'return_alignments' is TRUE, then a 'alignments' attribution is present which is a list of dataframes with each storing a possible best alignment scenario.

Examples

cost.mat <- data.frame()
dist <- edit_dist_string("leaf","leaves")$distance
dist <- edit_dist_string("ph_l_i_z","p_l_i_s",cost_mat=cost.mat,delim="_")$distance
alignments <- edit_dist_string("ph_l_i_z","p_l_i_s",delim="_",return_alignments=TRUE)$alignments

[Package lingdist version 1.0 Index]