| DataframeSource {tm} | R Documentation |
Data Frame Source
Description
Create a data frame source.
Usage
DataframeSource(x)
Arguments
x |
A data frame giving the texts and metadata. |
Details
A data frame source interprets each row of the data frame x as a
document. The first column must be named "doc_id" and contain a unique
string identifier for each document. The second column must be named
"text" and contain a UTF-8 encoded string representing the
document's content. Optional additional columns are used as document level
metadata.
Value
An object inheriting from DataframeSource, SimpleSource,
and Source.
See Also
Source for basic information on the source infrastructure
employed by package tm, and meta for types of metadata.
readtext for reading in a text in multiple formats
suitable to be processed by DataframeSource.
Examples
docs <- data.frame(doc_id = c("doc_1", "doc_2"),
text = c("This is a text.", "This another one."),
dmeta1 = 1:2, dmeta2 = letters[1:2],
stringsAsFactors = FALSE)
(ds <- DataframeSource(docs))
x <- Corpus(ds)
inspect(x)
meta(x)
[Package tm version 0.7-13 Index]