R: This pre-tokenizer simply splits using the following regex:...

pre_tokenizer_whitespace {tok}

R Documentation

This pre-tokenizer simply splits using the following regex: `⁠\w+|[^\w\s]+⁠`

This pre-tokenizer simply splits using the following regex: ⁠\w+|[^\w\s]+⁠

tok::tok_pre_tokenizer -> tok_pre_tokenizer_whitespace

Initializes the whistespace tokenizer

pre_tokenizer_whitespace$new()

The objects of this class are cloneable with this method.

pre_tokenizer_whitespace$clone(deep = FALSE)