hyphen,kRp.text-method {koRpus} | R Documentation |
Automatic hyphenation
Description
These methods implement word hyphenation, based on Liang's algorithm.
For details, please refer to the documentation for the generic
hyphen
method in the sylly
package.
Usage
## S4 method for signature 'kRp.text'
hyphen(
words,
hyph.pattern = NULL,
min.length = 4,
rm.hyph = TRUE,
corp.rm.class = "nonpunct",
corp.rm.tag = c(),
quiet = FALSE,
cache = TRUE,
as = "kRp.hyphen",
as.feature = FALSE
)
## S4 method for signature 'kRp.text'
hyphen_df(
words,
hyph.pattern = NULL,
min.length = 4,
rm.hyph = TRUE,
quiet = FALSE,
cache = TRUE
)
## S4 method for signature 'kRp.text'
hyphen_c(
words,
hyph.pattern = NULL,
min.length = 4,
rm.hyph = TRUE,
quiet = FALSE,
cache = TRUE
)
Arguments
words |
Either an object of class |
hyph.pattern |
Either an object of class |
min.length |
Integer,
number of letters a word must have for considering a hyphenation. |
rm.hyph |
Logical, whether appearing hyphens in words should be removed before pattern matching. |
corp.rm.class |
A character vector with word classes which should be ignored. The default value
|
corp.rm.tag |
A character vector with POS tags which should be ignored. Relevant only if |
quiet |
Logical. If |
cache |
Logical. |
as |
A character string defining the class of the object to be returned. Defaults to |
as.feature |
Logical,
whether the output should be just the analysis results or the input object with
the results added as a feature. Use |
Value
An object of class kRp.text
,
kRp.hyphen
,
data.frame
or a numeric vector,
depending on the values of the as
and as.feature
arguments.
References
Liang, F.M. (1983). Word Hy-phen-a-tion by Com-put-er. Dissertation, Stanford University, Dept. of Computer Science.
[1] http://tug.ctan.org/tex-archive/language/hyph-utf8/tex/generic/hyph-utf8/patterns/
[2] http://www.ctan.org/tex-archive/macros/latex/base/lppl.txt
See Also
read.hyph.pat
,
manage.hyph.pat
Examples
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
sample_file <- file.path(
path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
)
# call hyphen on a given english word
# "quiet=TRUE" suppresses the progress bar
hyphen(
"interference",
hyph.pattern="en",
quiet=TRUE
)
# call hyphen() on a tokenized text
tokenized.obj <- tokenize(
txt=sample_file,
lang="en"
)
# language definition is defined in the object
# if you call hyphen() without arguments,
# you will get its results directly
hyphen(tokenized.obj)
# alternatively, you can also store those results as a
# feature in the object itself
tokenized.obj <- hyphen(
tokenized.obj,
as.feature=TRUE
)
# results are now part of the object
hasFeature(tokenized.obj)
corpusHyphen(tokenized.obj)
} else {}