replace_word_elongation {textclean} | R Documentation |
Replace Word Elongations
Description
In informal writing people may use a form of text embellishment to emphasize
or alter word meanings called elongation (a.k.a. "word lengthening"). For
example, the use of "Whyyyyy" conveys frustration. Other times the usage may
be to be more sexy (e.g., "Heyyyy there"). Other times it may be used for
emphasis (e.g., "This is so gooood"). This function uses an augmented form
of Armstrong & Fogarty's (2007) algorithm. The algorithm first attempts to
replace the elongation with known semantic replacements (optional; default is
FALSE
). After this the algorithm locates all places were the same
letter (case insensitive) appears 3 times consecutively. These elements are
then further processed. The matches are replaced via fgsub
by first
taking the elongation to it's canonical form (drop all > 1 consecutive
letters to a single letter) and then replacing with the most common word
used in 2008 in Google's ngram data set that takes the canonical form. If
the canonical form is not found in the Google data set then the canonical
form is used as the replacement.
Usage
replace_word_elongation(x, impart.meaning = FALSE, ...)
Arguments
x |
The text variable. |
impart.meaning |
logical. If |
... |
ignored. |
Value
Returns a vector with word elongations replaced.
References
Armstrong, D. B., Fogarty, G. J., & Dingsdag, D. (2007). Scales measuring
characteristics of small business information systems. Proceedings of the
2011 Conference on Empirical Methods in Natural Language Processing (pp.
562-570). Edinburgh, Scotland. Retrieved from
http://www.aclweb.org/anthology/D11-1052
http://storage.googleapis.com/books/ngrams/books/datasetsv2.html
https://www.theatlantic.com/magazine/archive/2013/03/dragging-it-out/309220
https://english.stackexchange.com/questions/189517/is-there-a-name-term-for-multiplied-vowels
Examples
x <- c('look', 'noooooo!', 'real coooool!', "it's sooo goooood", 'fsdfds',
'fdddf', 'as', "aaaahahahahaha", "aabbccxccbbaa", 'I said heyyy!',
"I'm liiiike whyyyyy me?", "Wwwhhatttt!", NA)
replace_word_elongation(x)
replace_word_elongation(x, impart.meaning = TRUE)