PDE_analyzer {PDE} | R Documentation |
Extracting data from PDF (Portable Document Format) files
Description
The PDE_analyzer
allows the sentence and table extraction from multiple
PDF files.
Usage
PDE_analyzer(PDE_parameters_file_path = NA, verbose = TRUE)
Arguments
PDE_parameters_file_path |
String. This file includes all parameters to
run |
verbose |
Logical. Indicates whether messages will be printed in the console. Default: |
Value
If tables were extracted from the PDF file the function returns a list of
following tables/items: 1) htmltablelines, 2)
txttablelines, 3) keeplayouttxttablelines, 4) id,
5) out_msg.
The tablelines are tables that provide the heading and position of
the detected tables. The id provide the name of the PDF file. The
out_msg includes all messages printed to the console or the suppressed
messages if verbose=FALSE
.
Details
The parameter file (also referred to as .tsv file) can
either manually or with the help of the PDE_analyzer_i
interface be filled.
Note
A detailed description of the parameters in the TSV file can be
found in the markdown file (README_PDE.md) and in the description of
PDE_extr_data_from_pdfs
.
See Also
Examples
if(PDE_check_Xpdf_install() == TRUE){
PDE_analyzer(paste0(system.file(package = "PDE"),
"/examples/tsvs/PDE_parameters_v1.4_all_files+-0.tsv"))
}
## Not run:
## requires user file choice:
PDE_analyzer()
## End(Not run)