divide_conquer_mds {bigmds} | R Documentation |

Roughly speaking, a large data set, `x`

, of size `n`

is divided into parts, then classical MDS is performed over every part and,
finally, the partial configurations are combined so that all the points lie
on the same coordinate system with the aim to obtain a global MDS configuration.

```
divide_conquer_mds(x, l, c_points, r, n_cores = 1, dist_fn = stats::dist, ...)
```

`x` |
A matrix with |

`l` |
The size for which classical MDS can be computed efficiently
(using |

`c_points` |
Number of points used to align the MDS solutions obtained by the
division of |

`r` |
Number of principal coordinates to be extracted. |

`n_cores` |
Number of cores wanted to use to run the algorithm. |

`dist_fn` |
Distance function used to compute the distance between the rows. |

`...` |
Further arguments passed to |

The divide-and-conquer MDS starts dividing the `n`

points into
`p`

partitions: the first partition contains `l`

points and the others
contain `l-c_points`

points. Therefore, `p = 1 + (n-`

`l)/(l-c_points)`

.
The partitions are created at random.

Once the partitions are created, `c_points`

different random
points are taken from the first partition and concatenated to the other
partitions After that, classical MDS is applied to each partition,
with target low dimensional configuration `r`

.

Since all the partitions share `c_points`

points with the first one, Procrustes can be applied in order to align all
the configurations. Finally, all the configurations are
concatenated in order to obtain a global MDS configuration.

Returns a list containing the following elements:

- points
A matrix that consists of

`n`

points (rows) and`r`

variables (columns) corresponding to the principal coordinates. Since a dimensionality reduction is performed,`r`

`<<k`

- eigen
The first

`r`

largest eigenvalues:`\bar{\lambda}_i, i \in \{1, \dots, r\}`

, where`\bar{\lambda}_i = 1/p \sum_{j=1}^{p}\lambda_i^j/n_j`

, being`\lambda_i^j`

the`i-th`

eigenvalue from partition`j`

and`n_j`

the size of the partition`j`

.- GOF
A numeric vector of length 2.

The first element corresponds to

`1/n \sum_{j=1}^{p}n_jG_1^j`

, where`G_1^j = \sum_{i = 1}^{r} \lambda_{i}^{j}/ \sum_{i = 1}^{n-1} |\lambda_{i}^{j}|`

.The second element corresponds to

`1/n \sum_{j=1}^{p}n_jG_2^j`

where`G_2^j = \sum_{i = 1}^{r} \lambda_{i}^{j}/ \sum_{i = 1}^{n-1} max(\lambda_{i}^{j}, 0).`

Delicado P. and C. Pachón-García (2021). *Multidimensional Scaling for Big Data*.
https://arxiv.org/abs/2007.11919.

Borg, I. and P. Groenen (2005). *Modern Multidimensional Scaling: Theory and Applications*. Springer.

```
set.seed(42)
x <- matrix(data = rnorm(4*10000), nrow = 10000) %*% diag(c(9, 4, 1, 1))
mds <- divide_conquer_mds(x = x, l = 200, c_points = 2*2, r = 2, n_cores = 1, dist_fn = stats::dist)
head(mds$points)
mds$eigen
mds$GOF
points <- mds$points
plot(x[1:10, 1],
x[1:10, 2],
xlim = range(c(x[1:10,1],points[1:10,1])),
ylim = range(c(x[1:10,2], points[1:10,2])),
pch = 19,
col = "green")
text(x[1:10, 1], x[1:10, 2], labels=1:10)
points(points[1:10, 1], points[1:10, 2], pch = 19, col = "orange")
text(points[1:10, 1], points[1:10, 2], labels=1:10)
abline(v = 0, lwd=3, lty=2)
abline(h = 0, lwd=3, lty=2)
```

[Package *bigmds* version 2.0.1 Index]