plot.MultimodDiagnostic {stm} | R Documentation |
Plotting Method for Multimodality Diagnostic Objects
Description
The plotting method for objects of the S3 class 'MultimodDiagnostic', which
are returned by the function multiSTM()
, which performs a battery of
tests aimed at assessing the stability of the local modes of an STM model.
Usage
## S3 method for class 'MultimodDiagnostic'
plot(x, ind = NULL, topics = NULL, ...)
Arguments
x |
An object of S3 class 'MultimodDiagnostic'. See
|
ind |
An integer of list of integers specifying which plots to generate
(see details). If |
topics |
An integer or vector of integers specifying the topics for
which to plot the posterior distribution of covariate effect estimates. If
|
... |
Other arguments to be passed to the plotting functions. |
Details
This methods generates a series of plots, which are indexed as follows. If a
subset of the plots is required, specify their indexes using the ind
argument. Please note that not all plot types are available for every object
of class 'MultimodDiagnostic':
Histogram of Expected Common Words: Generates a 10-bin histogram of the column means of
obj$wmat
, a K-by-N matrix reporting the number of "top words" shared by the reference model and the candidate model. The "top words" for a given topic are defined as the 10 highest-frequency words.Histogram of Expected Common Documents: Generates a 10-bin histogram of the column means of
obj$tmat
, a K-by-N matrix reporting the number of "top documents" shared by the reference model and the candidate model. The "top documents" for a given topic are defined as the 10 documents in the reference corpus with highest topical frequency.Distribution of .95 Confidence-Interval Coverage for Regression Estimates: Generates a histogram of
obj$confidence.ratings
, a vector whose entries specify the proportion of regression coefficient estimates in a candidate model that fall within the .95 confidence interval for the corresponding estimate in the reference model. This can only be generated ifobj$confidence.ratings
is non-NULL
.Posterior Distributions of Covariate Effect Estimates By Topic: Generates a square matrix of plots, each depicting the posterior distribution of the regression coefficients for the covariate specified in
obj$reg.parameter.index
for one topic. The topics for which the plots are to be generated are specified by thetopics
argument. If the length oftopics
is not a perfect square, the plots matrix will include white space. The plots have a dashed black vertical line at zero, and a continuous red vertical line indicating the coefficient estimate in the reference model. This can only be generated ifobj$cov.effects
is non-NULL
.-
Histogram of Expected L1-Distance From Reference Model: Generates a 10-bin histogram of the column means of
obj$lmat
, a K-by-N matrix reporting the L1-distance of each topic from the corresponding one in the reference model. L1-distance vs. Top-10 Word Metric: Produces a smoothed color density representation of the scatterplot of
obj$lmat
andobj$wmat
, the metrics for L1-distance and shared top-words, obtained through a kernel density estimate. This can be used to validate the metrics under consideration.L1-distance vs. Top-10 Docs Metric: Produces a smoothed color density representation of the scatterplot of
obj$lmat
andobj$tmat
, the metrics for L1-distance and shared top-documents, obtained through a kernel density estimate. This can be used to validate the metrics under consideration.Top-10 Words vs. Top-10 Docs Metric: Produces a smoothed color density representation of the scatterplot of
obj$wmat
andobj$tmat
, the metrics for shared top-words and shared top-documents, obtained through a kernel density estimate. This can be used to validate the metrics under consideration.Maximized Bound vs. Aggregate Top-10 Words Metric: Generates a scatter plot with linear trendline for the maximized bound vector (
obj$lb
) and a linear transformation of the top-words metric aggregated by model (obj$wmod/1000
).Maximized Bound vs. Aggregate Top-10 Docs Metric: Generates a scatter plot with linear trendline for the maximized bound vector (
obj$lb
) and a linear transformation of the top-docs metric aggregated by model (obj$tmod/1000
).Maximized Bound vs. Aggregate L1-Distance Metric: Generates a scatter plot with linear trendline for the maximized bound vector (
obj$lb
) and a linear transformation of the L1-distance metric aggregated by model (obj$tmod/1000
).Top-10 Docs Metric vs. Semantic Coherence: Generates a scatter plot with linear trendline for the reference-model semantic coherence scores and the column means of
object$tmat
.-
L1-Distance Metric vs. Semantic Coherence: Generates a scatter plot with linear trendline for the reference-model semantic coherence scores and the column means of
object$lmat
. Top-10 Words Metric vs. Semantic Coherence: Generates a scatter plot with linear trendline for the reference-model semantic coherence scores and the column means of
object$wmat
.Same as
5
, but using the limited-mass L1-distance metric. Can only be generated ifobj$mass.threshold != 1
.Same as
11
, but using the limited-mass L1-distance metric. Can only be generated ifobj$mass.threshold != 1
.Same as
7
, but using the limited-mass L1-distance metric. Can only be generated ifobj$mass.threshold != 1
.Same as
13
, but using the limited-mass L1-distance metric. Can only be generated ifobj$mass.threshold != 1
.
Author(s)
Brandon M. Stewart (Princeton University) and Antonio Coppola (Harvard University)
References
Roberts, M., Stewart, B., & Tingley, D. (Forthcoming). "Navigating the Local Modes of Big Data: The Case of Topic Models. In Data Analytics in Social Science, Government, and Industry." New York: Cambridge University Press.
See Also
Examples
## Not run:
# Example using Gadarian data
temp<-textProcessor(documents=gadarian$open.ended.response,
metadata=gadarian)
meta<-temp$meta
vocab<-temp$vocab
docs<-temp$documents
out <- prepDocuments(docs, vocab, meta)
docs<-out$documents
vocab<-out$vocab
meta <-out$meta
set.seed(02138)
mod.out <- selectModel(docs, vocab, K=3,
prevalence=~treatment + s(pid_rep),
data=meta, runs=20)
out <- multiSTM(mod.out, mass.threshold = .75,
reg.formula = ~ treatment,
metadata = gadarian)
plot(out)
plot(out, 1)
## End(Not run)