Corpus Data Frame


Create or test for corpus objects.


corpus_frame(..., row.names = NULL, filter = NULL)

as_corpus_frame(x, filter = NULL, ..., row.names = NULL)




data frame columns for corpus_frame; further arguments passed to as_corpus_text from as_corpus_frame.


character vector of row names for the corpus object.


text filter object for the "text" column in the corpus object.


object to be coerced or tested.


These functions create or convert another object to a corpus object. A corpus object is just a data frame with special functions for printing, and a column names "text" of type "corpus_text".

corpus has similar semantics to the data.frame function, except that string columns do not get converted to factors.

as_corpus_frame converts another object to a corpus data frame object. By default, the method converts x to a data frame with a column named "text" of type "corpus_text", and sets the class attribute of the result to c("corpus_frame", "data.frame").

is_corpus_frame tests whether x is a data frame with a column named "text" of type "corpus_text".

as_corpus_frame is generic: you can write methods to handle specific classes of objects.


corpus_frame creates a data frame with a column named "text" of type "corpus_text", and a class attribute set to c("corpus_frame", "data.frame").

as_corpus_frame attempts to coerce its argument to a corpus data frame object, setting the row.names and calling as_corpus_text on the "text" column with the filter and ... arguments.

is_corpus_frame returns TRUE or FALSE depending on whether its argument is a valid corpus object or not.

See Also

corpus-package, print.corpus_frame, corpus_text, read_ndjson.


# convert a data frame:
emoji <- data.frame(text = sapply(0x1f600 + 1:30, intToUtf8),
                    stringsAsFactors = FALSE)

# construct directly (no need for stringsAsFactors = FALSE):
corpus_frame(text = sapply(0x1f600 + 1:30, intToUtf8))
# convert a character vector:
as_corpus_frame(c(a = "goodnight", b = "moon")) # keeps names
as_corpus_frame(c(a = "goodnight", b = "moon"), row.names = NULL) # drops names

