JATSdecoder {JATSdecoder}R Documentation

JATSdecoder

Description

Function to extract and restructure NISO-JATS coded XML file or text into a list with metadata and text as selectable elements. Use CERMINE to convert PDF to CERMXML files.

Usage

JATSdecoder(
  x,
  sectionsplit = c("intro", "method", "result", "study", "experiment", "conclu",
    "implica", "discussion"),
  grepsection = "",
  sentences = FALSE,
  paragraph = FALSE,
  abstract2sentences = TRUE,
  output = "all",
  letter.convert = TRUE,
  unify.country.name = TRUE,
  greek2text = FALSE,
  warning = TRUE,
  countryconnection = FALSE,
  authorconnection = FALSE
)

Arguments

x

a NISO-JATS coded XML file or text.

sectionsplit

search patterns for section split of text parts (forced to lower case), e.g. c("intro", "method", "result", "discus").

grepsection

search pattern in regex to reduce text to specific section only.

sentences

Logical. IF TRUE text is returned as sectioned list with sentences.

paragraph

Logical. IF TRUE "<New paragraph>" is added at the end of each paragraph to enable manual splitting at paragraphs.

abstract2sentences

Logical. IF TRUE abstract is returned as vector with sentences.

output

selection of specific results to output c("all", "title", "author", "affiliation", "journal", "volume", "editor", "doi", "type", "history", "country", "subject", "keywords", "abstract", "sections", "text", "tables", "captions", "references").

letter.convert

Logical. If TRUE converts hexadecimal and HTML coded characters to Unicode.

unify.country.name

Logical. If TRUE tries to unify country name/s with list of country names from worldmap().

greek2text

Logical. If TRUE converts and unifies several greek letters to textual representation, e.g.: "alpha".

warning

Logical. If TRUE outputs a warning if processing CERMINE converted PDF files.

countryconnection

Logical. If TRUE outputs country connections as vector c("A - B","A - C", ...).

authorconnection

Logical. If TRUE outputs connections of a maximum of 50 involved authors as vector c("A - B","A - C", ...).

Value

List with extracted meta data, sectioned text and references.

Note

A short tutorial on how to work with JATSdecoder and the generated outputs can be found at: https://github.com/ingmarboeschen/JATSdecoder

Source

An interactive web application for selecting and analyzing extracted article metadata and study characteristics for articles linked to PubMed Central is hosted at: https://www.scianalyzer.com/

The XML version of PubMed Central database articles can be downloaded in bulk from:
https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_bulk/

References

Böschen (2021). "Software review: The JATSdecoder package - extract metadata, abstract and sectioned text from NISO-JATS coded XML documents; Insights to PubMed Central’s open access database.” Scientometrics. doi: 10.1007/s1119202104162z.

See Also

study.character for extracting different study characteristics at once.

get.stats for extracting statistical results from textual input and different file formats.

Examples

# download example XML file via URL
x<-"https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0114876&type=manuscript"
# file name
file<-paste0(tempdir(),"/file.xml")
# download URL as "file.xml" in tempdir() if a connection is possible
tryCatch({
readLines(x,n=1)
download.file(x,file)
},
warning = function(w) message(
  "Something went wrong. Check your internet connection and the link address."),
error = function(e) message(
  "Something went wrong. Check your internet connection and the link address."))
# convert full article to list with metadata, sectioned text and reference list
if(file.exists(file)) JATSdecoder(file)
# extract specific content (here: abstract and text)
if(file.exists(file)) JATSdecoder(file,output=c("abstract","text"))
# or use specific functions, e.g.:
if(file.exists(file)) get.abstract(file)
if(file.exists(file)) get.text(file)

[Package JATSdecoder version 1.2.0 Index]