get.text {JATSdecoder} | R Documentation |
get.text
Description
Extracts main textual content from NISO-JATS coded XML file or text as sectioned text.
Usage
get.text(
x,
sectionsplit = "",
grepsection = "",
letter.convert = TRUE,
greek2text = FALSE,
sentences = FALSE,
paragraph = FALSE,
cermine = "auto",
rm.table = TRUE,
rm.formula = TRUE,
rm.xref = TRUE,
rm.media = TRUE,
rm.graphic = TRUE,
rm.ext_link = TRUE
)
Arguments
x |
a NISO-JATS coded XML file or text. |
sectionsplit |
search patterns for section split (forced to lower case), e.g. c("intro", "method", "result", "discus"). |
grepsection |
search pattern to reduce text to specific section namings only. |
letter.convert |
Logical. If TRUE converts hexadecimal and HTML coded characters to Unicode. |
greek2text |
Logical. If TRUE some greek letters and special characters will be unified to textual representation (important to extract stats). |
sentences |
Logical. IF TRUE text is returned as sectioned list with sentences. |
paragraph |
Logical. IF TRUE "<New paragraph>" is added at the end of each paragraph to enable manual splitting at paragraphs. |
cermine |
Logical. If TRUE CERMINE specific error handling and letter conversion will be applied. If set to "auto" file name ending with 'cermxml$' will set cermine=TRUE. |
rm.table |
Logical. If TRUE removes <table> tag from text. |
rm.formula |
Logical. If TRUE removes <formula> tags. |
rm.xref |
Logical. If TRUE removes <xref> tag (citing) from text. |
rm.media |
Logical. If TRUE removes <media> tag from text. |
rm.graphic |
Logical. If TRUE removes <graphic> and <fig> tag from text. |
rm.ext_link |
Logical. If TRUE removes <ext link> tag from text. |
Value
List with two elements. 1: Character vector with section title/s, 2: Character vector with floating text of sections or list with vector of sentences per section/s if sentences=TRUE.
See Also
JATSdecoder
for simultaneous extraction of meta-tags, abstract, sectioned text and reference list.