R: OCR text extraction

pdf_ocr_text {pdftools}

R Documentation

OCR text extraction

Description

Perform OCR text extraction. This requires you have the tesseract package.

Usage

pdf_ocr_text(
  pdf,
  pages = NULL,
  opw = "",
  upw = "",
  dpi = 600,
  language = "eng",
  options = NULL
)

pdf_ocr_data(
  pdf,
  pages = NULL,
  opw = "",
  upw = "",
  dpi = 600,
  language = "eng",
  options = NULL
)

Arguments

`pdf`	file path or raw vector with pdf data
`pages`	which pages of the pdf file to extract
`opw`	string with owner password to open pdf
`upw`	string with user password to open pdf
`dpi`	resolution to render image that is passed to pdf_convert.
`language`	passed to tesseract to specify the languge of the engine.
`options`	passed to tesseract to specify OCR parameters

OCR text extraction

Description

Usage

Arguments

See Also