hyphen {sylly} | R Documentation |
Automatic hyphenation
Description
These methods implement word hyphenation, based on Liang's algorithm.
Usage
hyphen(words, ...)
## S4 method for signature 'character'
hyphen(
words,
hyph.pattern = NULL,
min.length = 4,
rm.hyph = TRUE,
quiet = FALSE,
cache = TRUE,
as = "kRp.hyphen"
)
hyphen_df(words, ...)
## S4 method for signature 'character'
hyphen_df(
words,
hyph.pattern = NULL,
min.length = 4,
rm.hyph = TRUE,
quiet = FALSE,
cache = TRUE
)
hyphen_c(words, ...)
## S4 method for signature 'character'
hyphen_c(
words,
hyph.pattern = NULL,
min.length = 4,
rm.hyph = TRUE,
quiet = FALSE,
cache = TRUE
)
Arguments
words |
Either a character vector with words/tokens to be hyphenated,
or any tagged text object generated with the |
... |
Only used for the method generic. |
hyph.pattern |
Either an object of class |
min.length |
Integer,
number of letters a word must have for considering a hyphenation. |
rm.hyph |
Logical, whether appearing hyphens in words should be removed before pattern matching. |
quiet |
Logical. If |
cache |
Logical. |
as |
A character string defining the class of the object to be returned. Defaults to |
Details
For this to work the function must be told which pattern set it should use to
find the right hyphenation spots. The most straight forward way to add support
for a particular language during a session is to load an appropriate language
package (e.g., the package sylly.en
for English or sylly.de
for German).
See available.sylly.lang
and
install.sylly.lang
for more informatin on how
to get language support packages.
After such a package was loaded, you can simply use the language abbreviation as
the value for the hyph.pattern
argument (like "en"
for the English
pattern set). If words
is an object that was tokenized and tagged with
the koRpus
package, its language definition can be used instead, i.e. you
don't need to specify hyph.pattern
, hyphen
will pick the language
automatically.
In case you'd rather use your own pattern set, hyph.pattern
can be an
object of class kRp.hyph.pat
, alternatively.
Value
An object of class kRp.hyphen
,
data.frame
or a numeric vector, depending on the value
of the as
argument.
References
Liang, F.M. (1983). Word Hy-phen-a-tion by Com-put-er. Dissertation, Stanford University, Dept. of Computer Science.
See Also
read.hyph.pat
,
manage.hyph.pat
,
available.sylly.lang
, and
install.sylly.lang
Examples
## Not run:
library(sylly.en)
sampleText <- c("This", "is", "a", "rather", "stupid", "demonstration")
hyphen(sampleText, hyph.pattern="en")
hyphen_df(sampleText, hyph.pattern="en")
hyphen_c(sampleText, hyph.pattern="en")
# using a koRpus object
hyphen(tagged.text)
## End(Not run)