doc_to_txt {rock} | R Documentation |
Convert a document (.docx, .pdf, .odt, .rtf, or .html) to a plain text file
Description
This used to be a thin wrapper around textreadr::read_document()
that also
writes the result to output
, doing its best to correctly write UTF-8
(based on the approach recommended in this blog post). However,
textreadr
was archived from CRAN. It now directly wraps the functions
that textreadr
wraps: pdftools::pdf_text()
, striprtf::read_rtf
, and
it uses xml2
to import .docx
and .odt
files, and rvest
to import
.html
files, using the code from the textreadr
package.
Usage
doc_to_txt(
input,
output = NULL,
encoding = rock::opts$get("encoding"),
newExt = NULL,
preventOverwriting = rock::opts$get("preventOverwriting"),
silent = rock::opts$get("silent")
)
Arguments
input |
The path to the input file. |
output |
The path and filename to write to. If this is a path to
an existing directory (without a filename specified), the |
encoding |
The encoding to use when writing the text file. |
newExt |
The extension to append: only used if |
preventOverwriting |
Whether to prevent overwriting existing files. |
silent |
Whether to the silent or chatty. |
Value
The converted source, as a character vector.
Examples
### This example requires the {xml2} package
if (requireNamespace("xml2", quietly = TRUE)) {
print(
rock::doc_to_txt(
input = system.file(
"extdata/doc-to-test.docx", package="rock"
)
)
);
}