R: Segment text into phrases

strj_tinyseg {audubon}

R Documentation

Segment text into phrases

Description

An alias of strj_tokenize(engine = "tinyseg").

Usage

strj_tinyseg(text, format = c("list", "data.frame"), split = FALSE)

Arguments

`text`	Character vector to be tokenized.
`format`	Output format. Choose `list` or `data.frame`.
`split`	Logical. If passed as `TRUE`, the function splits vectors into some sentences using `stringi::stri_split_boundaries(type = "sentence")` before tokenizing.

Value

A list or a data.frame.

Examples

strj_tinyseg(
  paste0(
    "\u3042\u306e\u30a4\u30fc\u30cf\u30c8",
    "\u30fc\u30f4\u30a9\u306e\u3059\u304d",
    "\u3068\u304a\u3063\u305f\u98a8"
  )
)
strj_tinyseg(
  paste0(
    "\u3042\u306e\u30a4\u30fc\u30cf\u30c8",
    "\u30fc\u30f4\u30a9\u306e\u3059\u304d",
    "\u3068\u304a\u3063\u305f\u98a8"
  ),
  format = "data.frame"
)

[Package audubon version 0.5.2 Index]