PDE_pdfs2table {PDE} | R Documentation |
Extracting all tables from a PDF (Portable Document Format) file
Description
PDE_pdfs2table
extracts all tables from a single PDF
file and writes output in the corresponding folder.
Usage
PDE_pdfs2table(
pdfs,
out = ".",
table.heading.words = "",
ignore.case.th = FALSE,
out.table.format = ".csv (WINDOWS-1252)",
dev_x = 20,
dev_y = 9999,
write.table.locations = FALSE,
exp.nondetc.tabs = TRUE,
delete = TRUE,
verbose = TRUE
)
Arguments
pdfs |
String. A list of paths to the PDF files to be analyzed. |
out |
String. Directory chosen to save tables in. Default:
|
table.heading.words |
List of strings. Different than standard (TABLE,
TAB or table plus number) headings to be detected. Regex rules apply (see
also
https://github.com/erikstricker/PDE/blob/master/inst/examples/cheatsheets/regex.pdf).
Default = |
ignore.case.th |
Logical. Are the additional table headings (see
|
out.table.format |
String. Output file format. Either comma separated
file |
dev_x |
Numeric. For a table the size of indention which would be
considered the same column. Default: |
dev_y |
Numeric. For a table the vertical distance which would be
considered the same row. Can be either a number or set to dynamic detection
[9999], in which case the font size is used to detect which words are in the
same row.
Default: |
write.table.locations |
Logical. If |
exp.nondetc.tabs |
Logical. If |
delete |
Logical. If |
verbose |
Logical. Indicates whether messages will be printed in the console. Default: |
See Also
PDE_extr_data_from_pdfs
,PDE_pdfs2table_searchandfilter
Examples
## Running a simple table extraction
if(PDE_check_Xpdf_install() == TRUE){
outputtables <- PDE_pdfs2table(pdf = paste0(system.file(package = "PDE"),
"/examples/Methotrexate/29973177_!.pdf"),
out = paste0(system.file(package = "PDE"),"/examples/29973177_tables/"))
}
## Running a the same table extraction as above with all paramaters shown
if(PDE_check_Xpdf_install() == TRUE){
outputtables <- PDE_pdfs2table(pdf = paste0(system.file(package = "PDE"),
"/examples/Methotrexate/29973177_!.pdf"),
out = paste0(system.file(package = "PDE"),"/examples/29973177_tables/"),
dev_x = 20,
dev_y = 9999,
table.heading.words = "",
ignore.case.th = FALSE,
out.table.format = ".csv (WINDOWS-1252)",
write.table.locations = FALSE,
exp.nondetc.tabs = FALSE,
delete = TRUE)
}