R: Truncate a sequence of character data

trunc_at {mclm}

R Documentation

Truncate a sequence of character data

Description

This method takes as its argument x an object that represents a sequence of character data, such as an object of class tokens, and truncates it at the position where a match for the argument pattern is found. Currently it is only implemented for tokens objects.

Usage

trunc_at(x, pattern, ...)

## S3 method for class 'tokens'
trunc_at(
  x,
  pattern,
  keep_this = FALSE,
  last_match = FALSE,
  from_end = FALSE,
  ...
)

Arguments

`x`	An object that represents a sequence of character data.
`pattern`	A regular expression.
`...`	Additional arguments.
`keep_this`	Logical. Whether the matching token itself should be kept. If `TRUE`, the truncating happens right after the matching token; if `FALSE`, right before.
`last_match`	Logical. In case there are several matching tokens, if `last_match` is `TRUE`, the last match will be used as truncating point; otherwise, the first match will.
`from_end`	Logical. If `FALSE`, the match starts from the first token progressing forward; if `TRUE`, it starts from the last token progressing backward. If `from_end` is `FALSE`, the part of `x` that is kept after truncation is the head of `x`. If it is `TRUE` instead, the part that is kept after truncation is the tail of `x`.

Value

A truncated version of x.

Examples

(toks <- tokenize('This is a first sentence . This is a second sentence .',
re_token_splitter = '\\s+'))

trunc_at(toks, re("[.]"))

trunc_at(toks, re("[.]"), last_match = TRUE)

trunc_at(toks, re("[.]"), last_match = TRUE, from_end = TRUE)

[Package mclm version 0.2.7 Index]