generate.MixSim {pmclust} | R Documentation |
Generate MixSim Examples for Testing
Description
This function utilizes MixSim to generate sets of data for testing algorithms.
Usage
generate.MixSim(N, p, K, MixSim.obj = NULL, MaxOmega = NULL,
BarOmega = NULL, PiLow = 1.0, sph = FALSE, hom = FALSE)
Arguments
N |
total sample size across all |
p |
|
K |
number of clusters. |
MixSim.obj |
an object returned from |
MaxOmega |
maximum overlap as in |
BarOmega |
averaged overlap as in |
PiLow |
lower bound of mixture proportion as in |
sph |
sph as in |
hom |
hom as in |
Details
If MixSim.obj
is NULL, then BarOmega
and MaxOmega
will be used in MixSim
to obtain a new
MixSim.obj
.
Value
A set of simulated data and information will be returned in a list variable including:
K | number of clusters, as the input |
p | dimension of data
X.spmd ,
as the input |
N | total sample size, as the input |
N.allspmds | a collection of sample sizes for all
S processors, as the input |
N.spmd | total sample size of given processor, as the input |
X.spmd | generated data set with dimension with
dimension N.spmd * p |
CLASS.spmd
| true id of each data, a vector of
length N.spmd
and has values from 1 to K |
N.CLASS.spmd | true sample size of each clusters, a
vector of length K |
MixSim.obj | the true model where data
X.spmd generated from
|
Author(s)
Wei-Chen Chen wccsnow@gmail.com and George Ostrouchov.
References
Melnykov, V., Chen, W.-C. and Maitra, R. (2012) “MixSim: Simulating Data to Study Performance of Clustering Algorithms”, Journal of Statistical Software, (accepted).
Programming with Big Data in R Website: https://pbdr.org/
See Also
Examples
## Not run:
# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r
### Setup environment.
library(pmclust, quiet = TRUE)
### Generate an example data.
N <- 5000
p <- 2
K <- 2
data.spmd <- generate.MixSim(N, p, K, BarOmega = 0.01)
X.spmd <- data.spmd$X.spmd
### Run clustering.
PARAM.org <- set.global(K = K) # Set global storages.
# PARAM.org <- initial.em(PARAM.org) # One initial.
PARAM.org <- initial.RndEM(PARAM.org) # Ten initials by default.
PARAM.new <- apecma.step(PARAM.org) # Run APECMa.
em.update.class() # Get classification.
### Get results.
N.CLASS <- get.N.CLASS(K)
comm.cat("# of class:", N.CLASS, "\n")
comm.cat("# of class (true):", data.spmd$N.CLASS.spmd, "\n")
### Quit.
finalize()
## End(Not run)