R: Algorithm for the Longest Common Subsequence Problem

LCS {qualV}

R Documentation

Algorithm for the Longest Common Subsequence Problem

Description

Determines the longest common subsequence of two strings.

Usage

LCS(a, b)

Arguments

`a`	vector (numeric or character), missing values are not accepted
`b`	vector (numeric or character), missing values are not accepted

Details

A longest common subsequence (LCS) is a common subsequence of two strings of maximum length. The LCS Problem consists of finding a LCS of two given strings and its length (LLCS). A qualitative similarity index QSI is computed by division of the LLCS over maximum length of 'a' and 'b'.

Value

`a`	vector `'a'`
`b`	vector `'b'`
`LLCS`	length of `LCS`
`LCS`	longest common subsequence
`QSI`	quality similarity index
`va`	one possible `LCS` of vector `'a'`
`vb`	one possible `LCS` of vector `'b'`

Note

LCS is now using a C version of the algorithm provided by Dominik Reusser.

References

Wagner, R. A. and Fischer, M. J. (1974) The String-to-String Correction Problem. Journal of the ACM, 21, 168-173.

Paterson, M. and Dancik, V. (1994) Longest Common Subsequences. Mathematical Foundations of Computer Science, 841, 127-142.

Gusfield, D. (1997) Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, England, ISBN 0-521-58519-8.

Examples

# direct use
a <- c("b", "c", "a", "b", "c", "b")
b <- c("a", "b", "c", "c", "b")
print(LCS(a, b))

# a constructed example
x <- seq(0, 2 * pi, 0.1)  # time
y <- 5 + sin(x)           # a process
o <- y + rnorm(x, sd=0.2) # observation with random error
p <- y + 0.1              # simulation with systematic bias
plot(x, o); lines(x, p)

lcs <- LCS(f.slope(x, o), f.slope(x, p))  # too much noise
lcs$LLCS
lcs$QSI

os <- ksmooth(x, o, kernel = "normal", bandwidth = dpill(x, o), x.points = x)$y
lcs <- LCS(f.slope(x, os), f.slope(x, p))
lcs$LLCS
lcs$QSI


# observed and measured data with non-matching time intervals
data(phyto)
bbobs    <- dpill(obs$t, obs$y)
n        <- tail(obs$t, n = 1) - obs$t[1] + 1
obsdpill <- ksmooth(obs$t, obs$y, kernel = "normal", bandwidth = bbobs,
                    n.points = n)
obss     <- data.frame(t = obsdpill$x, y = obsdpill$y)
obss     <- obss[match(sim$t, obss$t),]
obs_f1   <- f.slope(obss$t, obss$y)
sim_f1   <- f.slope(sim$t, sim$y)
lcs      <- LCS(obs_f1, sim_f1)
lcs$QSI

[Package qualV version 0.3-5 Index]