qgrams {stringdist} | R Documentation |
Get a table of qgram counts from one or more character vectors.
Description
Get a table of qgram counts from one or more character vectors.
Usage
qgrams(..., .list = NULL, q = 1L, useBytes = FALSE, useNames = !useBytes)
Arguments
... |
any number of (named) arguments, that will be coerced to character with |
.list |
Will be concatenated with the |
q |
size of q-gram, must be non-negative. |
useBytes |
Determine byte-wise qgrams. |
useNames |
Add q-grams as column names. If |
Value
A table with q
-gram counts. Detected q
-grams are column names and the argument names as row names.
If no argument names were provided, they will be generated.
Details
The input is converted to character
. If useBytes=TRUE
, each element is
converted to utf8
and then to integer
as in stringdist
.
Next,the data is passed to the underlying routine.
Strings with less than q
characters and elements containing NA
are skipped. Using q=0
therefore counts the number of empty strings ""
occuring in each argument.
See Also
Examples
qgrams('hello world',q=3)
# q-grams are counted uniquely over a character vector
qgrams(rep('hello world',2),q=3)
# to count them separately, do something like
x <- c('hello', 'world')
lapply(x,qgrams, q=3)
# output rows may be named, and you can pass any number of character vectors
x <- "I will not buy this record, it is scratched"
y <- "My hovercraft is full of eels"
z <- c("this", "is", "a", "dead","parrot")
qgrams(A = x, B = y, C = z,q=2)
# a tonque twister, showing the effects of useBytes and useNames
x <- "peter piper picked a peck of pickled peppers"
qgrams(x, q=2)
qgrams(x, q=2, useNames=FALSE)
qgrams(x, q=2, useBytes=TRUE)
qgrams(x, q=2, useBytes=TRUE, useNames=TRUE)