| ses {diffobj} | R Documentation |
Shortest Edit Script
Description
Computes shortest edit script to convert a into b by removing
elements from a and adding elements from b. Intended primarily
for debugging or for other applications that understand that particular
format. See GNU diff docs
for how to interpret the symbols.
Usage
ses(a, b, max.diffs = gdo("max.diffs"), warn = gdo("warn"))
ses_dat(a, b, extra = TRUE, max.diffs = gdo("max.diffs"), warn = gdo("warn"))
Arguments
a |
character |
b |
character |
max.diffs |
integer(1L), number of differences (default 50000L)
after which we abandon the |
warn |
TRUE (default) or FALSE whether to warn if we hit
|
extra |
TRUE (default) or FALSE, whether to also return the indices in
|
Details
ses will be much faster than any of the
diff* methods, particularly for large inputs with
limited numbers of differences.
NAs are treated as the string “NA”. Non-character inputs are coerced to character.
ses_dat provides a semi-processed “machine-readable” version of
precursor data to ses that may be useful for those desiring to use the
raw diff data and not the printed output of diffobj, but do not wish
to manually parse the ses output. Whether it is faster than
ses or not depends on the ratio of matching to non-matching values as
ses_dat includes matching values whereas ses does not. See
examples.
Value
character shortest edit script, or a machine readable version of it
as a data.frame with columns op (factor, values
“Match”, “Insert”, or “Delete”), val character
corresponding to the value taken from either a or b,
and if extra is TRUE, integer columns id.a and id.b
corresponding to the indices in a or b that val was
taken from. See Details.
Examples
a <- letters[1:6]
b <- c('b', 'CC', 'DD', 'd', 'f')
ses(a, b)
(dat <- ses_dat(a, b))
## use `ses_dat` output to construct a minimal diff
## color with ANSI CSI SGR
diff <- dat[['val']]
del <- dat[['op']] == 'Delete'
ins <- dat[['op']] == 'Insert'
if(any(del))
diff[del] <- paste0("\033[33m- ", diff[del], "\033[m")
if(any(ins))
diff[ins] <- paste0("\033[34m+ ", diff[ins], "\033[m")
if(any(!ins & !del))
diff[!ins & !del] <- paste0(" ", diff[!ins & !del])
writeLines(diff)
## We can recover `a` and `b` from the data
identical(subset(dat, op != 'Insert', val)[[1]], a)
identical(subset(dat, op != 'Delete', val)[[1]], b)