readability {sylcount} | R Documentation |
readability
Description
Computes some basic "readability" measurements, includeing Flesch Reading
Ease, Flesch-Kincaid grade level, Automatic Readability Index, and the
Simple Measure of Gobbledygook. The function is vectorized by document, and
scores are computed in parallel via OpenMP. You can control the number of
threads used with the nthreads
parameter.
The function will have some difficulty on poorly processed and cleaned data. For example, if all punctuation is stripped out, then the number of sentences detected will always be zero. However, we do recommend removing quotes (single and double), as contractions can confuse the parser.
Usage
readability(s, nthreads = sylcount.nthreads())
Arguments
s |
A character vector (vector of strings). |
nthreads |
Number of threads to use. By default it will use the total number of cores + hyperthreads. |
Details
The return is split into words and non-words. A non-word is some block of text more than 64 characters long with no spaces or sentence-ending punctuation inbetween. The number of non-words is returned mostly for error-checking/debugging purposes. If you have a lot of non-words, you probably didn't clean your text properly. The word/non-word division is made in an attempt to improve run-time and memory performance.
For implementation details, see the Details section of ?sylcount
.
Value
A dataframe containing:
chars | the total numberof characters |
wordchars | the number of alphanumeric characters |
words | text tokens that are probably English language words |
nonwords | text tokens that are probably not English language words |
sents | the number of sentences recognized in the text |
sylls | the total number of syllables (ignores all non-words) |
polys | the total number of "polysyllables", or words with 3+ syllables |
re | Flesch reading ease score |
gl | Flesch-Kincaid grade level score |
ari | Automatic Readability Index score |
smog | Simple Measure of Gobbledygook (SMOG) score |
cl | the Coleman-Liau Index score |
References
Kincaid, J. Peter, et al. Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. No. RBR-8-75. Naval Technical Training Command Millington TN Research Branch, 1975.
Senter, R. J., and Edgar A. Smith. Automated readability index. CINCINNATI UNIV OH, 1967.
McLaughlin, G. Harry. "SMOG grading-a new readability formula." Journal of reading 12.8 (1969): 639-646.
Coleman, Meri, and Ta Lin Liau. "A computer readability formula designed for machine scoring." Journal of Applied Psychology 60.2 (1975): 283.
See Also
Examples
library(sylcount)
a <- "I am the very model of a modern major general."
b <- "I have information vegetable, animal, and mineral."
# One or the other
readability(a, nthreads=1)
readability(b, nthreads=1)
# Bot at once as separate documents.
readability(c(a, b), nthreads=1)
# And as a single document.
readability(paste0(a, b, collapse=" "), nthreads=1)