stri_enc_toutf32 {stringi}R Documentation

Convert Strings To UTF-32

Description

UTF-32 is a 32-bit encoding where each Unicode code point corresponds to exactly one integer value. This function converts a character vector to a list of integer vectors so that, e.g., individual code points may be easily accessed, changed, etc.

Usage

stri_enc_toutf32(str)

Arguments

str

a character vector (or an object coercible to) to be converted

Details

See stri_enc_fromutf32 for a dual operation.

This function is roughly equivalent to a vectorized call to utf8ToInt(enc2utf8(str)). If you want a list of raw vectors on output, use stri_encode.

Unlike utf8ToInt, if ill-formed UTF-8 byte sequences are detected, a corresponding element is set to NULL and a warning is generated. To deal with such issues, use, e.g., stri_enc_toutf8.

Value

Returns a list of integer vectors. Missing values are converted to NULLs.

Author(s)

Marek Gagolewski and other contributors

See Also

The official online manual of stringi at https://stringi.gagolewski.com/

Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02

Other encoding_conversion: about_encoding, stri_enc_fromutf32(), stri_enc_toascii(), stri_enc_tonative(), stri_enc_toutf8(), stri_encode()


[Package stringi version 1.8.4 Index]