cln {sdam}R Documentation

Clean and re-encode a vector, list or dataframe

Description

A function to re-encode Greek (and other) characters and to remove symbols.

Usage

cln(x, level, what, na.rm, case, repl, unlist)

Arguments

x

a vector, list or dataframe

level

optional clean level, either 0 for no-clean, default 1 to most strict 9 (see details)

what

additional characters to clean (optional)

na.rm

remove entries with NA data? (optional and logical)

case

case for text 1 for 1st uppercase, code2 for lowercase, code3 for uppercase (optional)

repl

data frame with text to replace (optional)

unlist

return a vector? (optional and logical, for vector input)

Details

This function is meant to re-encode Greek (and other) characters in the EDH set given either as list format, vector, or a dataframe produced with function edhw for example.

By default, the symbols "?" "*" "+" placed at the end of each record are removed after the re-encoding. However, when level is 0 only re-encoding is performed, and level 2 is either to force an extra iteration in the re-encoding, to remove extra spaces, or what is in what at the end of a record when clean what is invoked. With level 9 all content after an opening parenthesis is removed with all the consequences for the input text.

With repl, is possible to replace a list of text in two columns, for ‘text to replace’ and for ‘text that replaces’.

Disabling option unlist returns a vector in case that x is also a vector; otherwise, it returns a list with the two versions of the input.

Value

Depending on the input, a vector, list or dataframe.

Warning

Encoding more than once the same input requires re-starting the console; otherwise, the re-encoding is not complete.

Author(s)

Antonio Rivero Ostoic

See Also

edhw, get.edh, edhwpd, cs

Examples

# clean Greek characters
cln("Caesar?*+")

[Package sdam version 1.1.4 Index]