stri_enc_info {stringi} | R Documentation |
Query a Character Encoding
Description
Gets basic information on a character encoding.
Usage
stri_enc_info(enc = NULL)
Arguments
enc |
|
Details
An error is raised if the provided encoding is unknown to ICU
(see stri_enc_list
for more details).
Value
Returns a list with the following components:
-
Name.friendly
– friendly encoding name: MIME Name or JAVA Name or ICU Canonical Name (the first of provided ones is selected, see below); -
Name.ICU
– encoding name as identified by ICU; -
Name.*
– other standardized encoding names, e.g.,Name.UTR22
,Name.IBM
,Name.WINDOWS
,Name.JAVA
,Name.IANA
,Name.MIME
(some of them may be unavailable for all the encodings); -
ASCII.subset
– is ASCII a subset of the given encoding?; -
Unicode.1to1
– for 8-bit encodings only: are all characters translated to exactly one Unicode code point and is the translation scheme reversible?; -
CharSize.8bit
– is this an 8-bit encoding, i.e., do we haveCharSize.min == CharSize.max
andCharSize.min == 1
?; -
CharSize.min
– minimal number of bytes used to represent a UChar (in UTF-16, this is not the same as UChar32) -
CharSize.max
– maximal number of bytes used to represent a UChar (in UTF-16, this is not the same as UChar32, i.e., does not reflect the maximal code point representation size)
Author(s)
Marek Gagolewski and other contributors
See Also
The official online manual of stringi at https://stringi.gagolewski.com/
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, doi:10.18637/jss.v103.i02
Other encoding_management:
about_encoding
,
stri_enc_list()
,
stri_enc_mark()
,
stri_enc_set()