dtm_resampler {text2map} | R Documentation |
Resamples an input DTM to generate new DTMs
Description
Takes any DTM and randomly resamples from each row, creating a new DTM
Usage
dtm_resampler(dtm, alpha = NULL, n = NULL)
Arguments
dtm |
Document-term matrix with terms as columns. Works with DTMs
produced by any popular text analysis package, or you can use the
|
alpha |
Number indicating proportion of document lengths, e.g.,
|
n |
Integer indicating the length of documents to be returned, e.g.,
|
Details
Using the row counts as probabilities, each document's tokens are resampled with replacement up to a certain proportion of the row count (set by alpha). This function can be used with iteration to "bootstrap" a DTM without returning to the raw text. It does not iterate, however, so operations can be performed on one DTM at a time without storing multiple DTMs in memory.
If alpha
is less than 1, then a proportion of each documents' lengths is
returned. For example, alpha = 0.50
will return a resampled DTM where each
row has half the tokens of the original DTM. If alpha = 2
, than each row in
the resampled DTM twice the number of tokens of the original DTM.
If an integer is provided to n
then all documents will be resampled to that
length. For example, n = 2000L
will resample each document until they are
2000 tokens long – meaning those shorter than 2000 will be increased in
length, while those longer than 2000 will be decreased in length. alpha
and n
should not be specified at the same time.
Value
returns a document-term matrix of class "dgCMatrix"