strj_segment {audubon} | R Documentation |
Segment text into tokens
Description
An alias of strj_tokenize(engine = "budoux")
.
Usage
strj_segment(text, format = c("list", "data.frame"), split = FALSE)
Arguments
text |
Character vector to be tokenized. |
format |
Output format. Choose |
split |
Logical. If passed as, the function splits the vector
into some sentences using |
Value
A List or a data.frame.
Examples
strj_segment(
paste0(
"\u3042\u306e\u30a4\u30fc\u30cf\u30c8",
"\u30fc\u30f4\u30a9\u306e\u3059\u304d",
"\u3068\u304a\u3063\u305f\u98a8"
)
)
strj_segment(
paste0(
"\u3042\u306e\u30a4\u30fc\u30cf\u30c8",
"\u30fc\u30f4\u30a9\u306e\u3059\u304d",
"\u3068\u304a\u3063\u305f\u98a8"
),
format = "data.frame"
)
[Package audubon version 0.5.2 Index]