R: Natural Language Processing for Meta Analysis

MetaNLP {MetaNLP}

R Documentation

Natural Language Processing for Meta Analysis

Description

The MetaNLP package provides methods to quickly transform a CSV-file with titles and abstracts to an R data frame that can be used for automatic title-abstract screening using machine learning.

A MetaNLP object is the base class of the package MetaNLP. It is initialized by passing the path to a CSV file and constructs a data frame whose column names are the words that occur in the titles and abstracts and whose cells contain the word counts for each paper.

Usage

MetaNLP(
  file,
  bounds = c(2, Inf),
  word_length = c(3, Inf),
  language = "english",
  ...
)

Arguments

`file`	Either the path to the CSV file or a data frame containing the abstracts
`bounds`	An integer vector of length 2. The first value specifies the minimum number of appearances of a word to become a column of the word count matrix, the second value specifies the maximum number. Defaults to `c(2, Inf)`.
`word_length`	An integer vector of length 2. The first value specifies the minimum number of characters of a word to become a column of the word count matrix, the second value specifies the maximum number. Defaults to `c(3, Inf)`.
`language`	The language for lemmatization and stemming. Supported languages are `english`, `french`, `german`, `russian` and `spanish`. For non-english languages make sure that the csv which is processed has the correct encoding.
`...`	Additional arguments passed on to `read.csv2`, e.g. when "," should be used as a separator or when the encoding should be changed. See read.table.

Details

An object of class MetaNLP contains a slot data_frame where the word count data frame is stored. The CSV file must have a column ID to identify each paper, a column title with the belonging titles of the papers and a column abstract which contains the abstracts. Furthermore, to store the decision for each paper, a column decision must exist, where the values are either "yes" and "no" or "include" and "exclude" or "maybe". The value "maybe" is handled as a "yes"/"include".

Value

An object of class MetaNLP

Note

To ensure correct processing of the data when there are special characters (e.g. "é" or "ü"), make sure that the csv-file is correctly encoded as UTF-8. The stemming algorithm makes use of the C libstemmer library generated by Snowball. When german texts are stemmed, umlauts are replaced by their non-umlaut equivalent, so "ä" becomes "a" etc.

Author(s)

Maintainer: Maximilian Pilz maximilian.pilz@itwm.fraunhofer.de (ORCID)

Authors:

Nico Bruder bruder@imbi.uni-heidelberg.de
Samuel Zimmermann zimmermann@imbi.uni-heidelberg.de (ORCID)
Johannes Vey vey@imbi.uni-heidelberg.de (ORCID)

Other contributors:

Institute of Medical Biometry - University of Heidelberg [copyright holder]

Examples

path <- system.file("extdata", "test_data.csv", package = "MetaNLP", mustWork = TRUE)
obj <- MetaNLP(path)

[Package MetaNLP version 0.1.2 Index]