DataframeSource {tm} | R Documentation |
Data Frame Source
Description
Create a data frame source.
Usage
DataframeSource(x)
Arguments
x |
A data frame giving the texts and metadata. |
Details
A data frame source interprets each row of the data frame x
as a
document. The first column must be named "doc_id"
and contain a unique
string identifier for each document. The second column must be named
"text"
and contain a UTF-8 encoded string representing the
document's content. Optional additional columns are used as document level
metadata.
Value
An object inheriting from DataframeSource
, SimpleSource
,
and Source
.
See Also
Source
for basic information on the source infrastructure
employed by package tm, and meta
for types of metadata.
readtext
for reading in a text in multiple formats
suitable to be processed by DataframeSource
.
Examples
docs <- data.frame(doc_id = c("doc_1", "doc_2"),
text = c("This is a text.", "This another one."),
dmeta1 = 1:2, dmeta2 = letters[1:2],
stringsAsFactors = FALSE)
(ds <- DataframeSource(docs))
x <- Corpus(ds)
inspect(x)
meta(x)
[Package tm version 0.7-13 Index]