R: Read a Text File by Auto-Detecting Encoding

scancn {chinese.misc}

R Documentation

Read a Text File by Auto-Detecting Encoding

Description

The function reads a text file and tries to detect file encoding. If you have Chinese files from different sources and cannot give them a single encoding, just let this function detect and read them. The function can save you much time on dealing with unrecognizable characters.

Usage

scancn(x, enc = "auto", collapse = "   ")

Arguments

`x`	a length 1 character specifying filename.
`enc`	a length 1 character of file encoding specified by user. The default is "auto", which means let the function detect encoding.
`collapse`	this is used by the `collapse` argument of `paste` in order to link characters together. Default is " " (three spaces).

Details

The function calls scan(x, what = "character", ...) and auto-detects file encoding. Sometimes a Chinese file is encoded in "UTF-8", but what is actually read is a "?". When this happens, the function reads it twice and uses stringi::stri_encode to convert it. If invalid inputs are found in the content, the file will also be read twice.

The function always returns a length 1 character. If the return of scan is a vector with length larger than 1, elements will be pasted together with three spaces or other specified symbols.

It will return a " " (one space) when all the elements of the vector are NA. If not all elements are NA, those equal to NA will be changed to "" (a size 0 string) before being pasted together.

Value

a length 1 character of text.

Examples

# No Chinese is allowed, so try an English file
x <- file.path(find.package("base"), "CITATION")
scancn(x)

[Package chinese.misc version 0.2.3 Index]