encoding {readtext} | R Documentation |
detect the encoding of texts
Description
Detect the encoding of texts in a character readtext object and report
on the most likely encoding for each document. Useful in detecting the
encoding of input texts, so that a source encoding can be (re)specified when
inputting a set of texts using readtext()
, prior to constructing
a corpus.
Usage
encoding(x, verbose = TRUE, ...)
Arguments
x |
character vector, corpus, or readtext object whose texts' encodings will be detected. |
verbose |
if |
... |
additional arguments passed to stri_enc_detect |
Details
Based on stri_enc_detect, which is in turn based on the ICU libraries. See the ICU User Guide, https://unicode-org.github.io/icu/userguide/.
Examples
## Not run: encoding(data_char_encodedtexts)
# show detected value for each text, versus known encoding
data.frame(labelled = names(data_char_encodedtexts),
detected = encoding(data_char_encodedtexts)$all)
# Russian text, Windows-1251
myreadtext <- readtext("https://kenbenoit.net/files/01_er_5.txt")
encoding(myreadtext)
## End(Not run)
[Package readtext version 0.91 Index]