MetaNLP {MetaNLP} | R Documentation |
Natural Language Processing for Meta Analysis
Description
The MetaNLP package provides methods to quickly transform a CSV-file with titles and abstracts to an R data frame that can be used for automatic title-abstract screening using machine learning.
A MetaNLP
object is the base class of the package MetaNLP.
It is initialized by passing the path to a CSV file and constructs
a data frame whose column names are the words that occur in the titles
and abstracts and whose cells contain the word counts for each
paper.
Usage
MetaNLP(
file,
bounds = c(2, Inf),
word_length = c(3, Inf),
language = "english",
...
)
Arguments
file |
Either the path to the CSV file or a data frame containing the abstracts |
bounds |
An integer vector of length 2. The first value specifies
the minimum number of appearances of a word to become a column of the word
count matrix, the second value specifies the maximum number.
Defaults to |
word_length |
An integer vector of length 2. The first value specifies
the minimum number of characters of a word to become a column of the word
count matrix, the second value specifies the maximum number.
Defaults to |
language |
The language for lemmatization and stemming. Supported
languages are |
... |
Additional arguments passed on to |
Details
An object of class MetaNLP
contains a slot data_frame where
the word count data frame is stored.
The CSV file must have a column ID
to identify each paper, a column
title
with the belonging titles of the papers and a column
abstract
which contains the abstracts. Furthermore, to store the
decision for each paper, a column decision
must exist, where the
values are either "yes" and "no" or "include" and "exclude" or "maybe".
The value "maybe" is handled as a "yes"/"include".
Value
An object of class MetaNLP
Note
To ensure correct processing of the data when there are special characters
(e.g. "é" or "ü"), make sure that the csv-file is correctly encoded
as UTF-8
.
The stemming algorithm makes use of the C libstemmer library generated by
Snowball. When german texts are stemmed, umlauts are replaced by their
non-umlaut equivalent, so "ä" becomes "a" etc.
Author(s)
Maintainer: Maximilian Pilz maximilian.pilz@itwm.fraunhofer.de (ORCID)
Authors:
Nico Bruder bruder@imbi.uni-heidelberg.de
Samuel Zimmermann zimmermann@imbi.uni-heidelberg.de (ORCID)
Johannes Vey vey@imbi.uni-heidelberg.de (ORCID)
Other contributors:
Institute of Medical Biometry - University of Heidelberg [copyright holder]
See Also
Useful links:
Report bugs at https://github.com/imbi-heidelberg/MetaNLP/issues
Examples
path <- system.file("extdata", "test_data.csv", package = "MetaNLP", mustWork = TRUE)
obj <- MetaNLP(path)