TermDocumentMatrix {tm} | R Documentation |
Term-Document Matrix
Description
Constructs or coerces to a term-document matrix or a document-term matrix.
Usage
TermDocumentMatrix(x, control = list())
DocumentTermMatrix(x, control = list())
as.TermDocumentMatrix(x, ...)
as.DocumentTermMatrix(x, ...)
Arguments
x |
for the constructors, a corpus or an R object from which a
corpus can be generated via |
control |
a named list of control options. There are local
options which are evaluated for each document and global options
which are evaluated once for the constructed matrix. Available local
options are documented in This is different for a Available global options are:
|
... |
the additional argument |
Value
An object of class TermDocumentMatrix
or class
DocumentTermMatrix
(both inheriting from a
simple triplet matrix in package slam)
containing a sparse term-document matrix or document-term matrix. The
attribute weighting
contains the weighting applied to the
matrix.
See Also
termFreq
for available local control options.
Examples
data("crude")
tdm <- TermDocumentMatrix(crude,
control = list(removePunctuation = TRUE,
stopwords = TRUE))
dtm <- DocumentTermMatrix(crude,
control = list(weighting =
function(x)
weightTfIdf(x, normalize =
FALSE),
stopwords = TRUE))
inspect(tdm[202:205, 1:5])
inspect(tdm[c("price", "prices", "texas"), c("127", "144", "191", "194")])
inspect(dtm[1:5, 273:276])
s <- SimpleCorpus(VectorSource(unlist(lapply(crude, as.character))))
m <- TermDocumentMatrix(s,
control = list(removeNumbers = TRUE,
stopwords = TRUE,
stemming = TRUE))
inspect(m[c("price", "texa"), c("127", "144", "191", "194")])