nchar_ctl {fansi} | R Documentation |
Control Sequence Aware Version of nchar
Description
nchar_ctl
counts all non Control Sequence characters.
nzchar_ctl
returns TRUE for each input vector element that has non Control
Sequence sequence characters. By default newlines and other C0 control
characters are not counted.
Usage
nchar_ctl(
x,
type = "chars",
allowNA = FALSE,
keepNA = NA,
ctl = "all",
warn = getOption("fansi.warn", TRUE),
strip
)
nzchar_ctl(
x,
keepNA = FALSE,
ctl = "all",
warn = getOption("fansi.warn", TRUE)
)
Arguments
x |
a character vector or object that can be coerced to such. |
type |
character(1L) partial matching
|
allowNA |
logical: should |
keepNA |
logical: should |
ctl |
character, which Control Sequences should be treated
specially. Special treatment is context dependent, and may include
detecting them and/or computing their display/character width as zero. For
the SGR subset of the ANSI CSI sequences, and OSC hyperlinks,
|
warn |
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions |
strip |
character, deprecated in favor of |
Details
nchar_ctl
and nzchar_ctl
are implemented in statically compiled code, so
in particular nzchar_ctl
will be much faster than the otherwise equivalent
nzchar(strip_ctl(...))
.
These functions will warn if either malformed or escape or UTF-8 sequences are encountered as they may be incorrectly interpreted.
Value
Like base::nchar
, with Control Sequences excluded.
Control and Special Sequences
Control Sequences are non-printing characters or sequences of characters.
Special Sequences are a subset of the Control Sequences, and include CSI
SGR sequences which can be used to change rendered appearance of text, and
OSC hyperlinks. See fansi
for details.
Output Stability
Several factors could affect the exact output produced by fansi
functions across versions of fansi
, R
, and/or across systems.
In general it is best not to rely on exact fansi
output, e.g. by
embedding it in tests.
Width and grapheme calculations depend on locale, Unicode database
version, and grapheme processing logic (which is still in development), among
other things. For the most part fansi
(currently) uses the internals of
base::nchar(type='width')
, but there are exceptions and this may change in
the future.
How a particular display format is encoded in Control Sequences is
not guaranteed to be stable across fansi
versions. Additionally, which
Special Sequences are re-encoded vs transcribed untouched may change.
In general we will strive to keep the rendered appearance stable.
To maximize the odds of getting stable output set normalize_state
to
TRUE
and type
to "chars"
in functions that allow it, and
set term.cap
to a specific set of capabilities.
Graphemes
fansi
approximates grapheme widths and counts by using heuristics for
grapheme breaks that work for most common graphemes, including emoji
combining sequences. The heuristic is known to work incorrectly with
invalid combining sequences, prepending marks, and sequence interruptors.
fansi
does not provide a full implementation of grapheme break detection to
avoid carrying a copy of the Unicode grapheme breaks table, and also because
the hope is that R will add the feature eventually itself.
The utf8
package provides a
conforming grapheme parsing implementation.
Note
The keepNA
parameter is ignored for R < 3.2.2.
See Also
?fansi
for details on how Control Sequences are
interpreted, particularly if you are getting unexpected results,
unhandled_ctl
for detecting bad control sequences.
Examples
nchar_ctl("\033[31m123\a\r")
## with some wide characters
cn.string <- sprintf("\033[31m%s\a\r", "\u4E00\u4E01\u4E03")
nchar_ctl(cn.string)
nchar_ctl(cn.string, type='width')
## Remember newlines are not counted by default
nchar_ctl("\t\n\r")
## The 'c0' value for the `ctl` argument does not include
## newlines.
nchar_ctl("\t\n\r", ctl="c0")
nchar_ctl("\t\n\r", ctl=c("c0", "nl"))
## The _sgr flavor only treats SGR sequences as zero width
nchar_sgr("\033[31m123")
nchar_sgr("\t\n\n123")
## All of the following are Control Sequences or C0 controls
nzchar_ctl("\n\033[42;31m\033[123P\a")