| stri_width {stringi} | R Documentation |
Determine the Width of Code Points
Description
Approximates the number of text columns the 'cat()' function might use to print a string using a mono-spaced font.
Usage
stri_width(str)
Arguments
str |
character vector or an object coercible to |
Details
The Unicode standard does not formalize the notion of a character width. Roughly based on http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c, https://github.com/nodejs/node/blob/master/src/node_i18n.cc, and UAX #11 we proceed as follows. The following code points are of width 0:
code points with general category (see stringi-search-charclass)
Me,Mn, andCf),-
C0andC1control codes (general categoryCc) - for compatibility with thencharfunction, Hangul Jamo medial vowels and final consonants (code points with enumerable property
UCHAR_HANGUL_SYLLABLE_TYPEequal toU_HST_VOWEL_JAMOorU_HST_TRAILING_JAMO; note that applying the NFC normalization withstri_trans_nfcis encouraged),ZERO WIDTH SPACE (U+200B),
Characters with the UCHAR_EAST_ASIAN_WIDTH enumerable property
equal to U_EA_FULLWIDTH or U_EA_WIDE are
of width 2.
Most emojis and characters with general category So (other symbols) are of width 2.
SOFT HYPHEN (U+00AD) (for compatibility with nchar)
as well as any other characters have width 1.
Value
Returns an integer vector of the same length as str.
Author(s)
Marek Gagolewski and other contributors
References
East Asian Width – Unicode Standard Annex #11, https://www.unicode.org/reports/tr11/
See Also
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02
Other length:
%s$%(),
stri_isempty(),
stri_length(),
stri_numbytes(),
stri_pad_both(),
stri_sprintf()
Examples
stri_width(LETTERS[1:5])
stri_width(stri_trans_nfkd('\u0105'))
stri_width(stri_trans_nfkd('\U0001F606'))
stri_width( # Full-width equivalents of ASCII characters:
stri_enc_fromutf32(as.list(c(0x3000, 0xFF01:0xFF5E)))
)
stri_width(stri_trans_nfkd('\ubc1f')) # includes Hangul Jamo medial vowels and final consonants