R: Function to locate sections of pdf

heading_search {pdfsearch}

R Documentation

Function to locate sections of pdf

The ability to extract the location of the text and separate by sections. The function will return the headings with their location in the pdf.

heading_search(x, headings, path = FALSE, pdf_toc = FALSE,
  full_line = FALSE, ignore_case = FALSE, split_pdf = FALSE,
  convert_sentence = FALSE)

`x`	Either the text of the pdf read in with the pdftools package or a path for the location of the pdf file.
`headings`	A character vector representing the headings to search for. Can be NULL if pdf_toc = TRUE.
`path`	An optional path designation for the location of the pdf to be converted to text. The pdftools package is used for this conversion.
`pdf_toc`	TRUE/FALSE whether the pdf_toc function should be used from the `pdftools` package. This is most useful if the pdf has the table of contents embedded within the pdf. Must specify path = TRUE if pdf_toc = TRUE.
`full_line`	TRUE/FALSE indicating whether the headings should reside on their own line. This can create problems with multiple column pdfs.
`ignore_case`	TRUE/FALSE/vector of TRUE/FALSE, indicating whether the case of the keyword matters. Default is FALSE meaning that case of the headings keywords are literal. If a vector, must be same length as the headings vector.
`split_pdf`	TRUE/FALSE indicating whether to split the pdf using white space. This would be most useful with multicolumn pdf files. The split_pdf function attempts to recreate the column layout of the text into a single column starting with the left column and proceeding to the right.
`convert_sentence`	TRUE/FALSE indicating if individual lines of PDF file should be collapsed into a single large paragraph to perform keyword searching. Default is FALSE

file <- system.file('pdf', '1501.00450.pdf', package = 'pdfsearch')

heading_search(file, headings = c('abstract', 'introduction'),
  path = TRUE)

[Package pdfsearch version 0.3.0 Index]