get.text {JATSdecoder}R Documentation

get.text

Description

Extracts main textual content from NISO-JATS coded XML file or text as sectioned text.

Usage

get.text(
  x,
  sectionsplit = "",
  grepsection = "",
  letter.convert = TRUE,
  greek2text = FALSE,
  sentences = FALSE,
  paragraph = FALSE,
  cermine = "auto",
  rm.table = TRUE,
  rm.formula = TRUE,
  rm.xref = TRUE,
  rm.media = TRUE,
  rm.graphic = TRUE,
  rm.ext_link = TRUE
)

Arguments

x

a NISO-JATS coded XML file or text.

sectionsplit

search patterns for section split (forced to lower case), e.g. c("intro", "method", "result", "discus").

grepsection

search pattern to reduce text to specific section namings only.

letter.convert

Logical. If TRUE converts hexadecimal and HTML coded characters to Unicode.

greek2text

Logical. If TRUE some greek letters and special characters will be unified to textual representation (important to extract stats).

sentences

Logical. IF TRUE text is returned as sectioned list with sentences.

paragraph

Logical. IF TRUE "<New paragraph>" is added at the end of each paragraph to enable manual splitting at paragraphs.

cermine

Logical. If TRUE CERMINE specific error handling and letter conversion will be applied. If set to "auto" file name ending with 'cermxml$' will set cermine=TRUE.

rm.table

Logical. If TRUE removes <table> tag from text.

rm.formula

Logical. If TRUE removes <formula> tags.

rm.xref

Logical. If TRUE removes <xref> tag (citing) from text.

rm.media

Logical. If TRUE removes <media> tag from text.

rm.graphic

Logical. If TRUE removes <graphic> and <fig> tag from text.

rm.ext_link

Logical. If TRUE removes <ext link> tag from text.

Value

List with two elements. 1: Character vector with section title/s, 2: Character vector with floating text of sections or list with vector of sentences per section/s if sentences=TRUE.

See Also

JATSdecoder for simultaneous extraction of meta-tags, abstract, sectioned text and reference list.


[Package JATSdecoder version 1.2.0 Index]