structure_plot {fastTopics} | R Documentation |
Structure Plot
Description
Create a “Structure plot” from a multinomial topic model fit or other model with “loadings” or “weights”. The Structure plot represents the estimated topic proportions of each sample in a stacked bar chart, with bars of different colors representing different topics. Consequently, samples that have similar topic proportions have similar amounts of each color.
Usage
structure_plot(
fit,
topics,
grouping,
loadings_order = "embed",
n = 2000,
colors,
gap = 1,
embed_method = structure_plot_default_embed_method,
ggplot_call = structure_plot_ggplot_call,
...
)
structure_plot_default_embed_method(fit, ...)
## S3 method for class 'poisson_nmf_fit'
plot(x, ...)
## S3 method for class 'multinom_topic_model_fit'
plot(x, ...)
structure_plot_ggplot_call(dat, colors, ticks = NULL, font.size = 9)
Arguments
fit |
An object of class “poisson_nmf_fit” or
“multinom_topic_model_fit”, or an n x k matrix of topic
proportions, where k is the number of topics. (The elements in each
row of this matrix should sum to 1.) If a Poisson NMF fit is
provided as input, the corresponding multinomial topic model fit is
automatically recovered using |
topics |
Top-to-bottom ordering of the topics in the Structure
plot; |
grouping |
Optional categorical variable (a factor) with one
entry for each row of the loadings matrix |
loadings_order |
Ordering of the rows of the loadings matrix
|
n |
The maximum number of samples (rows of the loadings matrix
|
colors |
Colors used to draw topics in Structure plot. |
gap |
The horizontal spacing between groups. Ignored if
|
embed_method |
The function used to compute an 1-d embedding
from a loadings matrix |
ggplot_call |
The function used to create the plot. Replace
|
... |
Additional arguments passed to |
x |
An object of class “poisson_nmf_fit” or
“multinom_topic_model_fit”. If a Poisson NMF fit is provided
as input, the corresponding multinomial topic model fit is
automatically recovered using |
dat |
A data frame passed as input to
|
ticks |
The placement of the group labels along the horizontal
axis, and their names. For data that are not grouped, use
|
font.size |
Font size used in plot. |
Details
The name “Structure plot” comes from its widespread use in population genetics to visualize the results of the Structure software (Rosenberg et al, 2002).
For most uses of the Structure plot in population genetics, there is usually some grouping of the samples (e.g., assignment to pre-defined populations) that guides arrangement of the samples along the horizontal axis in the bar chart. In other applications, such as analysis of gene expression data, a pre-defined grouping may not always be available. Therefore, a “smart” arrangement of the samples is, by default, generated automatically by performing a 1-d embedding of the samples.
Value
A ggplot
object.
References
Dey, K. K., Hsiao, C. J. and Stephens, M. (2017). Visualizing the structure of RNA-seq expression data using grade of membership models. PLoS Genetics 13, e1006599.
Rosenberg, N. A., Pritchard, J. K., Weber, J. L., Cann, H. M., Kidd, K. K., Zhivotovsky, L. A. and Feldman, M. W. (2002). Genetic structure of human populations. Science 298, 2381–2385.
Examples
set.seed(1)
data(pbmc_facs)
# Get the multinomial topic model fitted to the
# PBMC data.
fit <- pbmc_facs$fit
# Create a Structure plot without labels. The samples (rows of L) are
# automatically arranged along the x-axis using t-SNE to highlight the
# structure in the data.
p1a <- structure_plot(fit)
# The first argument to structure_plot may also be an "L" matrix.
# This call to structure_plot should produce the exact same plot as
# the previous call.
set.seed(1)
p1b <- structure_plot(fit$L)
# There is no requirement than the rows of L sum up to 1. To
# illustrate, in this next example we have removed topic 5 from the a
# structure plot.
p2a <- structure_plot(fit$L[,-5])
# This is perhaps a more elegant way to remove topic 5 from the
# structure plot:
p2b <- structure_plot(fit,topics = c(1:4,6))
# Create a Structure plot with the FACS cell-type labels. Within each
# group (cell-type), the cells (rows of L) are automatically arranged
# using t-SNE.
subpop <- pbmc_facs$samples$subpop
p3 <- structure_plot(fit,grouping = subpop)
# Next, we apply some customizations to improve the plot: (1) use the
# "topics" argument to specify the order in which the topic
# proportions are stacked on top of each other; (2) use the "gap"
# argrument to increase the whitespace between the groups; (3) use "n"
# to decrease the number of rows of L included in the Structure plot;
# and (4) use "colors" to change the colors used to draw the topic
# proportions.
topic_colors <- c("skyblue","forestgreen","darkmagenta",
"dodgerblue","gold","darkorange")
p4 <- structure_plot(fit,grouping = pbmc_facs$samples$subpop,gap = 20,
n = 1500,topics = c(5,6,1,4,2,3),colors = topic_colors)
# In this example, we use UMAP instead of t-SNE to arrange the
# cells in the Structure plot. Note that this can be accomplished in
# a different way by overriding the default setting of
# "embed_method".
y <- drop(umap_from_topics(fit,dims = 1))
p5 <- structure_plot(fit,loadings_order = order(y),grouping = subpop,
gap = 40,colors = topic_colors)
# We can also use PCA to arrange the cells.
y <- drop(pca_from_topics(fit,dims = 1))
p6 <- structure_plot(fit,loadings_order = order(y),grouping = subpop,
gap = 40,colors = topic_colors)
# In this final example, we plot a random subset of 400 cells, and
# arrange the cells randomly along the horizontal axis of the
# Structure plot.
p7 <- structure_plot(fit,loadings_order = sample(3744,400),gap = 10,
grouping = subpop,colors = topic_colors)