scancn {chinese.misc}R Documentation

Read a Text File by Auto-Detecting Encoding

Description

The function reads a text file and tries to detect file encoding. If you have Chinese files from different sources and cannot give them a single encoding, just let this function detect and read them. The function can save you much time on dealing with unrecognizable characters.

Usage

scancn(x, enc = "auto", collapse = "   ")

Arguments

x

a length 1 character specifying filename.

enc

a length 1 character of file encoding specified by user. The default is "auto", which means let the function detect encoding.

collapse

this is used by the collapse argument of paste in order to link characters together. Default is " " (three spaces).

Details

The function calls scan(x, what = "character", ...) and auto-detects file encoding. Sometimes a Chinese file is encoded in "UTF-8", but what is actually read is a "?". When this happens, the function reads it twice and uses stringi::stri_encode to convert it. If invalid inputs are found in the content, the file will also be read twice.

The function always returns a length 1 character. If the return of scan is a vector with length larger than 1, elements will be pasted together with three spaces or other specified symbols.

It will return a " " (one space) when all the elements of the vector are NA. If not all elements are NA, those equal to NA will be changed to "" (a size 0 string) before being pasted together.

Value

a length 1 character of text.

Examples

# No Chinese is allowed, so try an English file
x <- file.path(find.package("base"), "CITATION")
scancn(x)

[Package chinese.misc version 0.2.3 Index]