stmJSON {stmCorrViz}R Documentation

Generate JSON Representation of STM Model

Description

This function generates a nested JSON structure representing a fitted Structual Topic Model (STM). Used internally by stmCorrViz. Most users will not need to call this directly.

Usage

stmJSON(mod, documents_raw=NULL, documents_matrix=NULL,
         title="STM Model", clustering_threshold=1.5,
         labels_number=7, verbose)

Arguments

mod

An STM fitted model from the stm package.

documents_raw

The raw documents used to generate the STM model. A character vector where each entry is the full text of a document.

documents_matrix

Document-term matrix representation of the raw documents, as generated by the prepDocuments function.

title

Root node label. This defaults to "STM Model".

clustering_threshold

A parameter specifying the level of aggregation in the hierarchical clustering routine for topics. Lower threshold values produce more binary splits and deeper trees, while higher threshold values produce more aggregation and trees that have significant breadth rather than depth. See below for more details.

labels_number

The number of top words used to label each node (topic or topical cluster) in the visualization.

verbose

Logical. If set to TRUE, displays function progress in the console during execution.

Details

A nested JSON structure representing the hierarchical model is produced as follows. The function first retrieves the theta matrix from the STM object; accordingly computes correlations among topics; and then uses the correlation metrics to compute distances. The function finally performs hierarchical clustering on the topics by calling the hclust function.

The function finds all binary splits in the middle of the clustering tree whose clustering height measure is below the threshold specified in the clustering_threshold argument. All these splits are marked as aggregation points. The routine retrieves the merge matrix from the output of hclust, and produces a new merge list by deleting all the splits performed at aggregation points along the tree. While the hclust merge matrix only contains binary splits, the new merge list can contain non-binary cluster splits.

The merge list is transformed into a structure of nested lists with a recursive function call. Each level of this nested structure corresponds to a node in the hierarchical representation of the STM model. The data structure is eventually transformed into a JSON object by using the jsonlite package.

New beta matrices and top words are computed for each of the topic clusters according to their membership, by marginalizing over content covariates. The clusters are labeled accordingly.

Value

A JSON string representing the full STM model.

References

Margaret E. Roberts, Brandon M. Stewart and Dustin Tingley (2014). stm: R Package for Structural Topic Models.

See Also

stmCorrViz


[Package stmCorrViz version 1.3 Index]