pmclust and pkmeans {pmclust} | R Documentation |
Parallel Model-Based Clustering and Parallel K-means Algorithm
Description
Parallel Model-Based Clustering and Parallel K-means Algorithm
Usage
pmclust(X = NULL, K = 2, MU = NULL,
algorithm = .PMC.CT$algorithm, RndEM.iter = .PMC.CT$RndEM.iter,
CONTROL = .PMC.CT$CONTROL, method.own.X = .PMC.CT$method.own.X,
rank.own.X = .pbd_env$SPMD.CT$rank.source, comm = .pbd_env$SPMD.CT$comm)
pkmeans(X = NULL, K = 2, MU = NULL,
algorithm = c("kmeans"),
CONTROL = .PMC.CT$CONTROL, method.own.X = .PMC.CT$method.own.X,
rank.own.X = .pbd_env$SPMD.CT$rank.source, comm = .pbd_env$SPMD.CT$comm)
Arguments
X |
a GBD row-major matrix. |
K |
number of clusters. |
MU |
pre-specified centers. |
algorithm |
types of EM algorithms. |
RndEM.iter |
number of Rand-EM iterations. |
CONTROL |
a control for algorithms, see |
method.own.X |
how |
rank.own.X |
who own |
comm |
MPI communicator. |
Details
These are high-level functions for several functions in pmclust
including: data distribution, setting global environment .pmclustEnv
,
initializations, algorithm selection, etc.
The input X
is in gbd
. It will
be converted in gbd
row-major format and copied into
.pmclustEnv
for computation. By default, pmclust uses a
GBD row-major format (gbdr
). While common
means that
X
is identical on all processors, and single
means that
X
only exist on one processor rank.own.X
.
Value
These functions return a list with class pmclust
or pkmeans
.
See the help page of PARAM
or PARAM.org
for details.
Author(s)
Wei-Chen Chen wccsnow@gmail.com and George Ostrouchov.
References
Programming with Big Data in R Website: https://pbdr.org/
See Also
Examples
## Not run:
# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r
### Setup environment.
library(pmclust, quiet = TRUE)
### Load data
X <- as.matrix(iris[, -5])
### Distribute data
jid <- get.jid(nrow(X))
X.gbd <- X[jid,]
### Standardized
N <- allreduce(nrow(X.gbd))
p <- ncol(X.gbd)
mu <- allreduce(colSums(X.gbd / N))
X.std <- sweep(X.gbd, 2, mu, FUN = "-")
std <- sqrt(allreduce(colSums(X.std^2 / (N - 1))))
X.std <- sweep(X.std, 2, std, FUN = "/")
### Clustering
library(pmclust, quiet = TRUE)
comm.set.seed(123, diff = TRUE)
ret.mb1 <- pmclust(X.std, K = 3)
comm.print(ret.mb1)
ret.kms <- pkmeans(X.std, K = 3)
comm.print(ret.kms)
### Finish
finalize()
## End(Not run)