readDOC {tm} | R Documentation |
Read In a MS Word Document
Description
Return a function which reads in a Microsoft Word document extracting its text.
Usage
readDOC(engine = c("antiword", "executable"), AntiwordOptions = "")
Arguments
engine |
a character string for the preferred DOC extraction engine (see Details). |
AntiwordOptions |
Options passed over to |
Details
Formally this function is a function generator, i.e., it returns a
function (which reads in a text document) with a well-defined
signature, but can access passed over arguments (e.g., options to
antiword
) via lexical scoping.
Available DOC extraction engines are as follows.
"antiword"
(default) Antiword utility as provided by the function
antiword
in package antiword."executable"
command line
antiword
executable which must be installed and accessible on your system. This can convert documents from Microsoft Word version 2, 6, 7, 97, 2000, 2002 and 2003 to plain text. The character vectorAntiwordOptions
is passed over to the executable.
Value
A function
with the following formals:
elem
a list with the named component
uri
which must hold a valid file name.language
a string giving the language.
id
Not used.
The function returns a PlainTextDocument
representing the text
and metadata extracted from elem$uri
.
See Also
Reader
for basic information on the reader infrastructure
employed by package tm.