count_words {epubr} | R Documentation |
Word count
Description
Count the number of words in a string.
Usage
count_words(x, word_pattern = "[A-Za-z0-9&]", break_pattern = " |\n")
Arguments
x |
character, a string containing words to be counted. May be a vector. |
word_pattern |
character, regular expression to match words. Elements not matched are not counted. |
break_pattern |
character, regular expression to split a string between words. |
Details
This function estimates the number of words in strings. Words are first separated using break_pattern
.
Then the resulting character vector elements are counted, including only those that are matched by word_pattern
.
The approach taken is meant to be simple and flexible.
epub
uses this function internally to estimate the number of words for each e-book section alongside the use of nchar
for counting individual characters.
It can be used directly on character strings and is convenient for applying with different regular expression pattern arguments as needed.
These two arguments are provided for control, but the defaults are likely good enough. By default, strings are split only on spaces and new line characters. The "words" that are counted in the resulting vector are those that contain any alphanumeric characters or the ampersand. This means for example that hyphenated words, acronyms and numbers displayed with digits, are all counted as words. The presence of any other characters does not negate that a word has been found.
Value
an integer
Examples
x <- " This sentence will be counted to have:\n\n10 (ten) words."
count_words(x)