rtika {rtika} | R Documentation |
rtika: R Interface to 'Apache Tika'
Description
Extract text or metadata from over a thousand file types. Get either plain text or structured XHTML content.
Installing
If you have not done so already, finish installing rtika by typing in the R console:
install_tika()
Getting Started
The tika_text
function will extract plain text from many types of documents. It is a good place to start. Please read the Vignette also.
Other main functions include tika_xml
and tika_html
that get a structured XHMTL rendition. The tika_json
function gets metadata as '.json', with XHMTL content.
The tika_json_text
function gets metadata as '.json', with plain text content.
tika
is the main function the others above inherit from.
Use tika_fetch
to download files with a file extension matching the Content-Type.
[Package rtika version 2.7.0 Index]