R: Fit a topic model using Latent Semantic Analysis

FitLsaModel {textmineR}

R Documentation

Fit a topic model using Latent Semantic Analysis

Description

A wrapper for RSpectra::svds that returns a nicely-formatted latent semantic analysis topic model.

Usage

FitLsaModel(dtm, k, calc_coherence = TRUE, return_all = FALSE, ...)

Arguments

`dtm`	A document term matrix of class `Matrix::dgCMatrix`
`k`	Number of topics
`calc_coherence`	Do you want to calculate probabilistic coherence of topics after the model is trained? Defaults to `TRUE`.
`return_all`	Should all objects returned from `RSpectra::svds` be returned here? Defaults to `FALSE`
`...`	Other arguments to pass to `svds` through its `opts` parameter.

Details

Latent semantic analysis, LSA, uses single value decomposition to factor the document term matrix. In many LSA applications, TF-IDF weights are applied to the DTM before model fitting. However, this is not strictly necessary.

Value

Returns a list with a minimum of three objects: phi, theta, and sv. The rows of phi index topics and the columns index tokens. The rows of theta index documents and the columns index topics. sv is a vector of singular values.

Examples

# Load a pre-formatted dtm 
data(nih_sample_dtm) 

# Convert raw word counts to TF-IDF frequency weights
idf <- log(nrow(nih_sample_dtm) / Matrix::colSums(nih_sample_dtm > 0))

dtm_tfidf <- Matrix::t(nih_sample_dtm) * idf

dtm_tfidf <- Matrix::t(dtm_tfidf)

# Fit an LSA model
model <- FitLsaModel(dtm = dtm_tfidf, k = 5)

str(model)

[Package textmineR version 3.0.5 Index]