rmdcount {rmdwc} | R Documentation |
Word, character and non-whitespace characters count
Description
rmdcount
counts lines, words, bytes, characters and non-whitespace characters in R Markdown files excluding code chunks.
txtcount
counts lines, words, bytes, characters and non-whitespace characters in plain text files.
Note that the counts may differ a bit from unix wc
and Libre Office because
it depends on the definition of a line, a word and a character.
Usage
rmdcount(
files = NULL,
space = "[[:space:]]",
word = "[[:space:]]+",
line = "\n",
exclude = "```\\{.*?```"
)
txtcount(
files = NULL,
space = "[[:space:]]",
word = "[[:space:]]+",
line = "\n"
)
Arguments
files |
character: file name(s) |
space |
character: pattern to split a text at spaces (default: |
word |
character: pattern to split a text at word boundaries (default: |
line |
character: pattern to split lines (default: |
exclude |
character: pattern to exclude text parts, e.g. code chunks (default: |
Details
We define:
- Line
the number of lines. It differs from unix
wc -l
sincewc
counts the number of newlines.- Word
it is considered to be a character or characters delimited by white space. However, a "word" is in general a fuzzy concept, for example is "3.141593" a word? Therefore different programs may count differently, for more details see the discussion to the Libreoffice bug Word count gives wrong results - Another Example Comment 5.
The following approach is used to detect lines, words, characters and non-whitespace characters.
- lines
strsplit(rmd, line)[[1]]
withline='\n'
- bytes
charToRaw(rmd)
- words
strsplit(rmd, word)[[1]]
withword='[[:space:]]+'
- characters
strsplit(rmd, '')[[1]]
- non-whitespace characters
strsplit(gsub(space, '', rmd), '')[[1]]
withspace='[[:space:]]'
If txtcount
is used then code chunks are deleted with gsub('```\\{.*?```', '', rmd)
before counting.
Value
a data frame with following elements
- file
basename of file
- lines
number of lines
- words
number of words
- bytes
number of bytes
- chars
number of characters
- nonws
number of non-whitespace characters
- path
path of file
Examples
# count excluding code chunks
files <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc")
rmdcount(files)
# count including code chunks
txtcount(files) # or rmdcount(files, exclude='')
# count for a set of R Markdown docs
files <- list.files(path=system.file('rmarkdown', package="rmdwc"),
pattern="*.Rmd", full.names=TRUE)
rmdcount(files)
# use of rmdcount() in a R Markdown document
if (interactive()) {
files <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc")
file.edit(files) # SAVE(!) the file and knit it
}
# count including code chunks
files <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc")
txtcount(files)