| collapse_tokens {gibasa} | R Documentation | 
Collapse sequences of tokens by condition
Description
Concatenates sequences of tokens in the tidy text dataset, while grouping them by an expression.
Usage
collapse_tokens(tbl, condition, .collapse = "")
Arguments
| tbl | A tidy text dataset. | 
| condition | < | 
| .collapse | String with which tokens are concatenated. | 
Details
Note that this function drops all columns except but 'token' and columns for grouping sequences. So, the returned data.frame has only 'doc_id', 'sentence_id', 'token_id', and 'token' columns.
Value
A data.frame.
Examples
## Not run: 
df <- tokenize(
  data.frame(
    doc_id = "odakyu-sen",
    text = "\u5c0f\u7530\u6025\u7dda"
  )
) |>
  prettify(col_select = "POS1")
collapse_tokens(
  df,
  POS1 == "\u540d\u8a5e" & stringr::str_detect(token, "^[\\p{Han}]+$")
) |>
  head()
## End(Not run)
[Package gibasa version 1.1.1 Index]