convert {quanteda} | R Documentation |
Convert quanteda objects to non-quanteda formats
Description
Convert a quanteda dfm or corpus object to a format useable by other
packages. The general function convert
provides easy conversion from a dfm
to the document-term representations used in all other text analysis packages
for which conversions are defined. For corpus objects, convert
provides
an easy way to make a corpus and its document variables into a data.frame.
Usage
convert(x, to, ...)
## S3 method for class 'dfm'
convert(
x,
to = c("lda", "tm", "stm", "austin", "topicmodels", "lsa", "matrix", "data.frame",
"tripletlist"),
docvars = NULL,
omit_empty = TRUE,
docid_field = "doc_id",
...
)
## S3 method for class 'corpus'
convert(x, to = c("data.frame", "json"), pretty = FALSE, ...)
Arguments
x |
|
to |
target conversion format, one of:
|
... |
unused directly |
docvars |
optional data.frame of document variables used as the
|
omit_empty |
logical; if |
docid_field |
character; the name of the column containing document
names used when |
pretty |
adds indentation whitespace to JSON output. Can be TRUE/FALSE or a number specifying the number of spaces to indent. See |
Value
A converted object determined by the value of to
(see above).
See conversion target package documentation for more detailed descriptions
of the return formats.
Examples
## convert a dfm
toks <- corpus_subset(data_corpus_inaugural, Year > 1970) |>
tokens()
dfmat1 <- dfm(toks)
# austin's wfm format
identical(dim(dfmat1), dim(convert(dfmat1, to = "austin")))
# stm package format
stmmat <- convert(dfmat1, to = "stm")
str(stmmat)
# triplet
tripletmat <- convert(dfmat1, to = "tripletlist")
str(tripletmat)
## Not run:
# tm's DocumentTermMatrix format
tmdfm <- convert(dfmat1, to = "tm")
str(tmdfm)
# topicmodels package format
str(convert(dfmat1, to = "topicmodels"))
# lda package format
str(convert(dfmat1, to = "lda"))
## End(Not run)
## convert a corpus into a data.frame
corp <- corpus(c(d1 = "Text one.", d2 = "Text two."),
docvars = data.frame(dvar1 = 1:2, dvar2 = c("one", "two"),
stringsAsFactors = FALSE))
convert(corp, to = "data.frame")
convert(corp, to = "json")