fast_mds {bigmds}R Documentation

Fast MDS

Description

Fast MDS uses recursive programming in combination with a divide and conquer strategy in order to obtain an MDS configuration for a given large data set x.

Usage

fast_mds(x, l, s_points, r, n_cores = 1, dist_fn = stats::dist, ...)

Arguments

x

A matrix with n individuals (rows) and k variables (columns).

l

The size for which classical MDS can be computed efficiently (using cmdscale function). It means that if \bar{l} is the limit size for which classical MDS is applicable, then l\leq \bar{l}.

s_points

Number of points used to align the MDS solutions obtained by the division of x into p submatrices. Recommended value: 2·r.

r

Number of principal coordinates to be extracted.

n_cores

Number of cores wanted to use to run the algorithm.

dist_fn

Distance function used to compute the distance between the rows.

...

Further arguments passed to dist_fn function.

Details

Fast MDS randomly divides the whole sample data set, x, of size n into p=l/s_points data subsets, where l \leq \bar{l} being \bar{l} the limit size for which classical MDS is applicable. Each one of the p data subsets has size \tilde{n} = n/p. If \tilde{n} \leq \code{l} then classical MDS is applied to each data subset. Otherwise, fast MDS is recursively applied. In either case, a final MDS configuration is obtained for each data subset.

In order to align all the partial solutions, a small subset of size s_points is randomly selected from each data subset. They are joined to form an alignment set, over which classical MDS is performed giving rise to an alignment configuration. Every data subset shares s_points points with the alignment set. Therefore every MDS configuration can be aligned with the alignment configuration using a Procrustes transformation.

Value

Returns a list containing the following elements:

points

A matrix that consists of n individuals (rows) and r variables (columns) corresponding to the principal coordinates. Since we are performing a dimensionality reduction, r<<k

eigen

The first r largest eigenvalues: \bar{\lambda}_i, i \in \{1, \dots, r\} , where \bar{\lambda}_i = 1/p \sum_{j=1}^{p}\lambda_i^j/n_j, being \lambda_i^j the i-th eigenvalue from partition j and n_j the size of the partition j.

GOF

A numeric vector of length 2.

The first element corresponds to 1/n \sum_{j=1}^{p}n_jG_1^j, where G_1^j = \sum_{i = 1}^{r} \lambda_{i}^{j}/ \sum_{i = 1}^{n-1} |\lambda_{i}^{j}|.

The second element corresponds to 1/n \sum_{j=1}^{p}n_jG_2^j where G_2^j = \sum_{i = 1}^{r} \lambda_{i}^{j}/ \sum_{i = 1}^{n-1} max(\lambda_{i}^{j}, 0).

References

Delicado P. and C. Pachón-García (2021). Multidimensional Scaling for Big Data. https://arxiv.org/abs/2007.11919.

Yang, T., J. Liu, L. McMillan, and W.Wang (2006). A fast approximation to multidimensional scaling. In Proceedings of the ECCV Workshop on Computation Intensive Methods for Computer Vision (CIMCV).

Borg, I. and P. Groenen (2005). Modern Multidimensional Scaling: Theory and Applications. Springer.

Examples

set.seed(42)
x <- matrix(data = rnorm(4*10000), nrow = 10000) %*% diag(c(9, 4, 1, 1))
mds <- fast_mds(x = x, l = 200, s_points = 2*2, r = 2, n_cores = 1, dist_fn = stats::dist)
head(mds$points)
mds$eigen
mds$GOF
points <- mds$points
plot(x[1:10, 1],
     x[1:10, 2],
     xlim = range(c(x[1:10,1],points[1:10,1])),
     ylim = range(c(x[1:10,2], points[1:10,2])),
     pch = 19,
     col = "green")
text(x[1:10, 1], x[1:10, 2], labels=1:10)
points(points[1:10, 1], points[1:10, 2], pch = 19, col = "orange")
text(points[1:10, 1], points[1:10, 2], labels=1:10)
abline(v = 0, lwd=3, lty=2)
abline(h = 0, lwd=3, lty=2)


[Package bigmds version 2.0.1 Index]