scancn {chinese.misc} | R Documentation |
Read a Text File by Auto-Detecting Encoding
Description
The function reads a text file and tries to detect file encoding. If you have Chinese files from different sources and cannot give them a single encoding, just let this function detect and read them. The function can save you much time on dealing with unrecognizable characters.
Usage
scancn(x, enc = "auto", collapse = " ")
Arguments
x |
a length 1 character specifying filename. |
enc |
a length 1 character of file encoding specified by user. The default is "auto", which means let the function detect encoding. |
collapse |
this is used by the |
Details
The function calls scan(x, what = "character", ...)
and
auto-detects file
encoding. Sometimes
a Chinese file is encoded in "UTF-8", but what is actually read is a "?". When this happens,
the function reads it twice and uses stringi::stri_encode
to convert it.
If invalid inputs are found in the content, the file will also be read twice.
The function always returns a length 1 character. If the return of scan
is a vector
with length larger than 1,
elements will be pasted together with three spaces
or other specified symbols.
It will return
a " " (one space) when all the elements of the vector are NA
.
If not all elements
are NA
, those equal to NA
will be changed to "" (a size 0 string) before being
pasted together.
Value
a length 1 character of text.
Examples
# No Chinese is allowed, so try an English file
x <- file.path(find.package("base"), "CITATION")
scancn(x)