read.pdf {pdfminer}R Documentation

Read a PDF document.

Description

Extract PDF document

Usage

read.pdf(
  file,
  pages = integer(),
  method = c("csv", "sqlite", "PythonInR"),
  laycntrl = layout_control(),
  encoding = "utf8",
  password = "",
  caching = TRUE,
  maxpages = Inf,
  rotation = 0L,
  image_dir = "",
  pyexe = "python3"
)

Arguments

file

a character string giving the name of the PDF-file the data are to be read from.

pages

an integer giving the pages which should be extracted (default is integer()).

method

a character string giving the data transfer method. Allowed values are "csv" (default), "sqlite" and "PythonInR" (recommended).

laycntrl

a list of layout options, created by the function layout_control.

encoding

a character string giving the encoding of the output (default is "utf8").

password

a character string giving the password necessary to access the PDF (default is "").

caching

a logical if TRUE (default) pdfminer is faster but uses more memory.

maxpages

an integer giving the maximum number of pages to be extracted (default is Inf).

rotation

an integer giving the rotation of the page, allowed values are c(0, 90, 180, 270).

image_dir

a character string giving the path to the folder, where the images should be stored (default is "").

pyexe

a character string giving the path to the python executable (default is "python3"). Only used when method is "csv" or "sqlite".

Value

Returns a object of class "pdf_document".

Examples

if (is_pdfminer_installed()) {
pdf_file <- system.file("pdfs/cars.pdf", package = "pdfminer")
read.pdf(pdf_file)
}

[Package pdfminer version 1.0 Index]