plot_wordvec_tSNE {PsychWordVec} | R Documentation |
Visualize word vectors with dimensionality reduced using t-SNE.
Description
Visualize word vectors with dimensionality reduced
using the t-Distributed Stochastic Neighbor Embedding (t-SNE) method
(i.e., projecting high-dimensional vectors into a low-dimensional vector space),
implemented by Rtsne::Rtsne()
.
You should specify a random seed if you expect reproducible results.
Usage
plot_wordvec_tSNE(
x,
dims = 2,
perplexity,
theta = 0.5,
colors = NULL,
seed = NULL,
custom.Rtsne = NULL
)
Arguments
x |
Can be:
|
dims |
Output dimensionality: |
perplexity |
Perplexity parameter, should not be larger than (number of words - 1) / 3.
Defaults to |
theta |
Speed/accuracy trade-off (increase for less accuracy), set to 0 for exact t-SNE. Defaults to 0.5. |
colors |
A character vector specifying (1) the categories of words (for 2-D plot only) or (2) the exact colors of words (for 2-D and 3-D plot). See examples for its usage. |
seed |
Random seed for reproducible results. Defaults to |
custom.Rtsne |
User-defined |
Value
2-D: A ggplot
object.
You may extract the data from this object using $data
.
3-D: Nothing but only the data was invisibly returned,
because rgl::plot3d()
is
"called for the side effect of drawing the plot"
and thus cannot return any 3-D plot object.
Download
Download pre-trained word vectors data (.RData
):
https://psychbruce.github.io/WordVector_RData.pdf
References
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.
van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.
See Also
Examples
d = as_embed(demodata, normalize=TRUE)
dt = get_wordvec(d, cc("
man, woman,
king, queen,
China, Beijing,
Japan, Tokyo"))
## 2-D (default):
plot_wordvec_tSNE(dt, seed=1234)
plot_wordvec_tSNE(dt, seed=1234)$data
colors = c(rep("#2B579A", 4), rep("#B7472A", 4))
plot_wordvec_tSNE(dt, colors=colors, seed=1234)
category = c(rep("gender", 4), rep("country", 4))
plot_wordvec_tSNE(dt, colors=category, seed=1234) +
scale_x_continuous(limits=c(-200, 200),
labels=function(x) x/100) +
scale_y_continuous(limits=c(-200, 200),
labels=function(x) x/100) +
scale_color_manual(values=c("#B7472A", "#2B579A"))
## 3-D:
colors = c(rep("#2B579A", 4), rep("#B7472A", 4))
plot_wordvec_tSNE(dt, dims=3, colors=colors, seed=1)