dsm.projection {wordspace} | R Documentation |
Reduce Dimensionality of DSM by Subspace Projection (wordspace)
Description
Reduce dimensionality of DSM by linear projection of row vectors into a lower-dimensional subspace. Various projections methods with different properties are available.
Usage
dsm.projection(model, n,
method = c("svd", "rsvd", "asvd", "ri", "ri+svd"),
oversampling = NA, q = 2, rate = .01, power=1,
with.basis = FALSE, verbose = FALSE)
Arguments
model |
either an object of class |
method |
projection method to use for dimensionality reduction (see “DETAILS” below) |
n |
an integer specifying the number of target dimensions. Use |
oversampling |
oversampling factor for stochastic dimensionality reduction algorithms ( |
q |
number of power iterations in the randomized SVD algorithm (Halko et al. 2009 recommend |
rate |
fill rate of random projection vectors. Each random dimension has on average |
power |
apply power scaling after SVD-based projection, i.e. multiply each latent dimension with a suitable power of the corresponding singular value.
The default |
with.basis |
if |
verbose |
if |
Details
The following dimensionality reduction algorithms can be selected with the method
argument:
- svd
singular value decomposition (SVD), using the efficient SVDLIBC algorithm (Berry 1992) from package sparsesvd if the input is a sparse matrix. If the DSM has been scored with
scale="center"
, this method is equivalent to principal component analysis (PCA).- rsvd
randomized SVD (Halko et al. 2009, p. 9) based on a factorization of rank
oversampling * n
withq
power iterations.- asvd
approximate SVD, which determines latent dimensions from a random sample of matrix rows including
oversampling * n
data points. This heuristic algorithm is highly inaccurate and has been deprecated.- ri
random indexing (RI), i.e. a projection onto random basis vectors that are approximately orthogonal. Basis vectors are generated by setting a proportion of
rate
elements randomly toor
. Note that this does not correspond to a proper orthogonal projection, so the resulting coordinates in the reduced space should be used with caution.
- ri+svd
RI to
oversampling * n
dimensions, followed by SVD of the pre-reduced matrix to the finaln
dimensions. This is not a proper orthogonal projection because the RI basis vectors in the first step are only approximately orthogonal.
Value
A numeric matrix with n
columns (latent dimensions) and the same number of rows as the original DSM. Some SVD-based algorithms may discard poorly conditioned singular values, returning fewer than n
columns.
If with.basis=TRUE
and an orthogonal projection is used, the corresponding orthogonal basis of the latent subspace is returned as an attribute
"basis"
. is column-orthogonal, hence
projects into latent coordinates and
is an orthogonal subspace projection in the original coordinate system.
For orthogonal projections, the attribute "R2"
contains a numeric vector specifying the proportion of the squared Frobenius norm of the original matrix captured by each of the latent dimensions. If the original matrix has been centered (so that a SVD projection is equivalent to PCA), this corresponds to the proportion of variance “explained” by each dimension.
For SVD-based projections, the attribute "sigma"
contains the singular values corresponding to latent dimensions. It can be used to adjust the power scaling exponent at a later time.
Author(s)
Stephanie Evert (https://purl.org/stephanie.evert)
References
Berry, Michael~W. (1992). Large scale sparse singular value computations. International Journal of Supercomputer Applications, 6, 13–49.
Halko, N., Martinsson, P. G., and Tropp, J. A. (2009). Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions. Technical Report 2009-05, ACM, California Institute of Technology.
See Also
rsvd
for the implementation of randomized SVD, and sparsesvd
for the SVDLIBC wrapper
Examples
# 240 English nouns in space with correlated dimensions "own", "buy" and "sell"
M <- DSM_GoodsMatrix[, 1:3]
# SVD projection into 2 latent dimensions
S <- dsm.projection(M, 2, with.basis=TRUE)
100 * attr(S, "R2") # dim 1 captures 86.4% of distances
round(attr(S, "basis"), 3) # dim 1 = commodity, dim 2 = owning vs. buying/selling
S[c("time", "goods", "house"), ] # some latent coordinates
## Not run:
idx <- DSM_GoodsMatrix[, 4] > .85 # only show nouns on "fringe"
plot(S[idx, ], pch=20, col="red", xlab="commodity", ylab="own vs. buy/sell")
text(S[idx, ], rownames(S)[idx], pos=3)
## End(Not run)