R: PCA, t-SNE and UMAP Plots

embedding_plot_2d {fastTopics}

R Documentation

PCA, t-SNE and UMAP Plots

Description

Visualize the structure of the Poisson NMF loadings or the multinomial topic model topic proportions by projection onto a 2-d surface. plot_hexbin_plot is most useful for visualizing the PCs of a data set with thousands of samples or more.

Usage

embedding_plot_2d(
  fit,
  Y,
  fill = "loading",
  k,
  fill.label,
  ggplot_call = embedding_plot_2d_ggplot_call,
  plot_grid_call = function(plots) do.call(plot_grid, plots)
)

embedding_plot_2d_ggplot_call(
  Y,
  fill,
  fill.type = c("loading", "numeric", "factor", "none"),
  fill.label,
  font.size = 9
)

pca_plot(
  fit,
  Y,
  pcs = 1:2,
  n = 10000,
  fill = "loading",
  k,
  fill.label,
  ggplot_call = embedding_plot_2d_ggplot_call,
  plot_grid_call = function(plots) do.call(plot_grid, plots),
  ...
)

tsne_plot(
  fit,
  Y,
  n = 2000,
  fill = "loading",
  k,
  fill.label,
  ggplot_call = embedding_plot_2d_ggplot_call,
  plot_grid_call = function(plots) do.call(plot_grid, plots),
  ...
)

umap_plot(
  fit,
  Y,
  n = 2000,
  fill = "loading",
  k,
  fill.label,
  ggplot_call = embedding_plot_2d_ggplot_call,
  plot_grid_call = function(plots) do.call(plot_grid, plots),
  ...
)

pca_hexbin_plot(
  fit,
  Y,
  pcs = 1:2,
  bins = 40,
  breaks = c(0, 1, 10, 100, 1000, Inf),
  ggplot_call = pca_hexbin_plot_ggplot_call,
  ...
)

pca_hexbin_plot_ggplot_call(Y, bins, breaks, font.size = 9)

Arguments

`fit`	An object of class “poisson_nmf_fit” or “multinom_topic_model_fit”.
`Y`	The n x 2 matrix containing the 2-d embedding, where n is the number of rows in `fit$L`. If not provided, the embedding will be computed automatically.
`fill`	The quantity to map onto the fill colour of the points in the PCA plot. Set `fill = "loading"` to vary the fill colour according to the loadings (or topic proportions) of the select topiced or topics. Alternatively, `fill` may be set to a data vector with one entry per row of `fit$L`, in which case the data are mapped to the fill colour of the points. When `fill = "none"`, the fill colour is not varied.
`k`	The dimensions or topics selected by number or name. When `fill = "loading"`, one plot is created per selected dimension or topic; when `fill = "loading"` and `k` is not specified, all dimensions or topics are plotted.
`fill.label`	The label used for the fill colour legend.
`ggplot_call`	The function used to create the plot. Replace `embedding_plot_2d_ggplot_call` or `pca_hexbin_plot_ggplot_call` with your own function to customize the appearance of the plot.
`plot_grid_call`	When `fill = "loading"` and multiple topics (`k`) are selected, this is the function used to arrange the plots into a grid using `plot_grid`. It should be a function accepting a single argument, `plots`, a list of `ggplot` objects.
`fill.type`	The type of variable mapped to fill colour. The fill colour is not varied when `fill.type = "none"`.
`font.size`	Font size used in plot.
`pcs`	The two principal components (PCs) to be plotted, specified by name or number.
`n`	The maximum number of points to plot. If `n` is less than the number of rows of `fit$L`, the rows are subsampled at random. This argument is ignored if `Y` is provided.
`...`	Additional arguments passed to `pca_from_topics`, `tsne_from_topics` or `umap_from_topics`. These additional arguments are only used if `Y` is not provided.
`bins`	Number of bins used to create hexagonal 2-d histogram. Passed as the “bins” argument to `stat_bin_hex`.
`breaks`	To produce the hexagonal histogram, the counts are subdivided into intervals based on `breaks`. Passed as the “breaks” argument to `cut`.

Details

This is a lightweight interface primarily intended to expedite creation of plots for visualizing the loadings or topic proportions; most of the heavy lifting is done by ‘ggplot2’. The 2-d embedding itself is computed by invoking pca_from_topics, tsne_from_topics or umap_from_topics. For more control over the plot's appearance, the plot can be customized by modifying the ggplot_call and plot_grid_call arguments.

An effective 2-d visualization may also require some fine-tunning of the settings, such as the t-SNE “perplexity”, or the number of samples included in the plot. The PCA, UMAP, t-SNE settings can be controlled by the additional arguments (...). Alternatively, a 2-d embedding may be pre-computed, and passed as argument Y.

Value

A ggplot object.

Examples

set.seed(1)
data(pbmc_facs)

# Get the Poisson NMF and multinomial topic models fitted to the
# PBMC data.
fit1 <- multinom2poisson(pbmc_facs$fit)
fit2 <- pbmc_facs$fit

# Plot the first two PCs of the loadings matrix (for the
# multinomial topic model, "fit2", the loadings are the topic
# proportions).
subpop <- pbmc_facs$samples$subpop
p1 <- pca_plot(fit1,k = 1)
p2 <- pca_plot(fit2)
p3 <- pca_plot(fit2,fill = "none")
p4 <- pca_plot(fit2,pcs = 3:4,fill = "none")
p5 <- pca_plot(fit2,fill = fit2$L[,1])
p6 <- pca_plot(fit2,fill = subpop)
p7 <- pca_hexbin_plot(fit1)
p8 <- pca_hexbin_plot(fit2)


# Plot the loadings using t-SNE.
p1 <- tsne_plot(fit1,k = 1)
p2 <- tsne_plot(fit2)
p3 <- tsne_plot(fit2,fill = subpop)

# Plot the loadings using UMAP.
p1 <- umap_plot(fit1,k = 1)
p2 <- umap_plot(fit2)
p3 <- umap_plot(fit2,fill = subpop)

[Package fastTopics version 0.6-192 Index]