R: Oxford Name Compression Algorithm

onca {phonics}

R Documentation

Oxford Name Compression Algorithm

Description

The Oxford Name Compression Algorithm name coding procedure

Usage

onca(word, maxCodeLen = 4, clean = TRUE, modified = FALSE, refined = FALSE)

Arguments

`word`	string or vector of strings to encode
`maxCodeLen`	maximum length of the resulting encodings, in characters
`clean`	if `TRUE`, return `NA` for unknown alphabetical characters
`modified`	if `TRUE`, use the modified `nysiis` function
`refined`	if `TRUE`, use the `refinedSoundex` function

Details

The variable word is the name to be encoded. The variable maxCodeLen is the limit on how long the returned name code should be. The default is 4.

The onca algorithm is only defined for inputs over the standard English alphabet, i.e., "A-Z.". Non-alphabetical characters are removed from the string in a locale-dependent fashion. This strips spaces, hyphens, and numbers. Other letters, such as "Ü," may be permissible in the current locale but are unknown to onca. For inputs outside of its known range, the output is undefined and NA is returned and a warning this thrown. If clean is FALSE, onca attempts to process the strings. The default is TRUE.

Value

the ONCA encoded character vector

References

Gill, Leicester. "OX-LINK: the Oxford medical record linkage system." (1997).

James P. Howard, II, "Phonetic Spelling Algorithm Implementations for R," Journal of Statistical Software, vol. 25, no. 8, (2020), p. 1–21, <10.18637/jss.v095.i08>.

Examples

onca("William")
onca(c("Peter", "Peady"))
onca("Stevenson", maxCodeLen = 8)

[Package phonics version 1.3.10 Index]