strsplit_ctl {fansi} | R Documentation |
Control Sequence Aware Version of strsplit
Description
A drop-in replacement for base::strsplit
.
Usage
strsplit_ctl(
x,
split,
fixed = FALSE,
perl = FALSE,
useBytes = FALSE,
warn = getOption("fansi.warn", TRUE),
term.cap = getOption("fansi.term.cap", dflt_term_cap()),
ctl = "all",
normalize = getOption("fansi.normalize", FALSE),
carry = getOption("fansi.carry", FALSE),
terminate = getOption("fansi.terminate", TRUE)
)
Arguments
x |
a character vector, or, unlike |
split |
character vector (or object which can be coerced to such)
containing regular expression(s) (unless |
fixed |
logical. If |
perl |
logical. Should Perl-compatible regexps be used? |
useBytes |
logical. If |
warn |
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions |
term.cap |
character a vector of the capabilities of the terminal, can
be any combination of "bright" (SGR codes 90-97, 100-107), "256" (SGR codes
starting with "38;5" or "48;5"), "truecolor" (SGR codes starting with
"38;2" or "48;2"), and "all". "all" behaves as it does for the |
ctl |
character, which Control Sequences should be treated
specially. Special treatment is context dependent, and may include
detecting them and/or computing their display/character width as zero. For
the SGR subset of the ANSI CSI sequences, and OSC hyperlinks,
|
normalize |
TRUE or FALSE (default) whether SGR sequence should be
normalized out such that there is one distinct sequence for each SGR code.
normalized strings will occupy more space (e.g. "\033[31;42m" becomes
"\033[31m\033[42m"), but will work better with code that assumes each SGR
code will be in its own escape as |
carry |
TRUE, FALSE (default), or a scalar string, controls whether to
interpret the character vector as a "single document" (TRUE or string) or
as independent elements (FALSE). In "single document" mode, active state
at the end of an input element is considered active at the beginning of the
next vector element, simulating what happens with a document with active
state at the end of a line. If FALSE each vector element is interpreted as
if there were no active state when it begins. If character, then the
active state at the end of the |
terminate |
TRUE (default) or FALSE whether substrings should have
active state closed to avoid it bleeding into other strings they may be
prepended onto. This does not stop state from carrying if |
Details
This function works by computing the position of the split points after
removing Control Sequences, and uses those positions in conjunction with
substr_ctl
to extract the pieces. This concept is borrowed from
crayon::col_strsplit
. An important implication of this is that you cannot
split by Control Sequences that are being treated as Control Sequences.
You can however limit which control sequences are treated specially via the
ctl
parameters (see examples).
Value
Like base::strsplit
, with Control Sequences excluded.
Control and Special Sequences
Control Sequences are non-printing characters or sequences of characters.
Special Sequences are a subset of the Control Sequences, and include CSI
SGR sequences which can be used to change rendered appearance of text, and
OSC hyperlinks. See fansi
for details.
Output Stability
Several factors could affect the exact output produced by fansi
functions across versions of fansi
, R
, and/or across systems.
In general it is best not to rely on exact fansi
output, e.g. by
embedding it in tests.
Width and grapheme calculations depend on locale, Unicode database
version, and grapheme processing logic (which is still in development), among
other things. For the most part fansi
(currently) uses the internals of
base::nchar(type='width')
, but there are exceptions and this may change in
the future.
How a particular display format is encoded in Control Sequences is
not guaranteed to be stable across fansi
versions. Additionally, which
Special Sequences are re-encoded vs transcribed untouched may change.
In general we will strive to keep the rendered appearance stable.
To maximize the odds of getting stable output set normalize_state
to
TRUE
and type
to "chars"
in functions that allow it, and
set term.cap
to a specific set of capabilities.
Bidirectional Text
fansi
is unaware of text directionality and operates as if all strings are
left to right (LTR). Using fansi
function with strings that contain mixed
direction scripts (i.e. both LTR and RTL) may produce undesirable results.
Note
The split positions are computed after both x
and split
are
converted to UTF-8.
Non-ASCII strings are converted to and returned in UTF-8 encoding. Width calculations will not work properly in R < 3.2.2.
See Also
?fansi
for details on how Control Sequences are
interpreted, particularly if you are getting unexpected results,
normalize_state
for more details on what the normalize
parameter does,
state_at_end
to compute active state at the end of strings,
close_state
to compute the sequence required to close active state.
Examples
strsplit_ctl("\033[31mhello\033[42m world!", " ")
## Splitting by newlines does not work as they are _Control
## Sequences_, but we can use `ctl` to treat them as ordinary
strsplit_ctl("\033[31mhello\033[42m\nworld!", "\n")
strsplit_ctl("\033[31mhello\033[42m\nworld!", "\n", ctl=c("all", "nl"))