textmodel_ca {quanteda.textmodels} | R Documentation |
Correspondence analysis of a document-feature matrix
Description
textmodel_ca
implements correspondence analysis scaling on a
dfm. The method is a fast/sparse version of function ca.
Usage
textmodel_ca(x, smooth = 0, nd = NA, sparse = FALSE, residual_floor = 0.1)
Arguments
x |
the dfm on which the model will be fit |
smooth |
a smoothing parameter for word counts; defaults to zero. |
nd |
Number of dimensions to be included in output; if |
sparse |
retains the sparsity if set to |
residual_floor |
specifies the threshold for the residual matrix for
calculating the truncated svd.Larger value will reduce memory and time cost
but might reduce accuracy; only applicable when |
Details
svds in the RSpectra package is applied to enable the fast computation of the SVD.
Value
textmodel_ca()
returns a fitted CA textmodel that is a special
class of ca object.
Note
You may need to set sparse = TRUE
) and
increase the value of residual_floor
to ignore less important
information and hence to reduce the memory cost when you have a very big
dfm.
If your attempt to fit the model fails due to the matrix being too large,
this is probably because of the memory demands of computing the residual matrix. To avoid this, consider increasing the value of
residual_floor
by 0.1, until the model can be fit.
Author(s)
Kenneth Benoit and Haiyan Wang
References
Nenadic, O. & Greenacre, M. (2007). Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca package. Journal of Statistical Software, 20(3). doi:10.18637/jss.v020.i03
See Also
Examples
library("quanteda")
dfmat <- dfm(tokens(data_corpus_irishbudget2010))
tmod <- textmodel_ca(dfmat)
summary(tmod)