pre_tokenizer_whitespace {tok}R Documentation

This pre-tokenizer simply splits using the following regex: ⁠\w+|[^\w\s]+⁠

Description

This pre-tokenizer simply splits using the following regex: ⁠\w+|[^\w\s]+⁠

This pre-tokenizer simply splits using the following regex: ⁠\w+|[^\w\s]+⁠

Super class

tok::tok_pre_tokenizer -> tok_pre_tokenizer_whitespace

Methods

Public methods


Method new()

Initializes the whistespace tokenizer

Usage
pre_tokenizer_whitespace$new()

Method clone()

The objects of this class are cloneable with this method.

Usage
pre_tokenizer_whitespace$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

Other pre_tokenizer: pre_tokenizer, pre_tokenizer_byte_level


[Package tok version 0.1.3 Index]