assign.N.sample {pmclust} | R Documentation |
Obtain a Set of Random Samples for X.spmd
Description
This utility function samples data randomly from X.spmd
to form a relatively small subset of original data. The EM algorithm on the
smaller subset is topically performing fast and capturing rough structures of
entire dataset.
Usage
assign.N.sample(total.sample = 5000, N.org.spmd)
Arguments
total.sample |
a total number of samples which will be selected from
the original data |
N.org.spmd |
the original data size,
i.e. |
Details
This utility function performs simple random sampling without replacement
for the original dataset X.spmd
. Different random seeds should
be set before calling this function.
Value
A list variable will be returned and containing:
N | total sample size across all
S processors |
N.spmd | sample size of given processor |
N.allspmds | a collection of sample sizes for all
S processors |
ID.spmd | index of selected samples ranged from 1
to N.org.spmd
|
Note that N
and N.allspmds
are the same across all
S
processors, but N.spmd
and ID.spmd
are most
likely all distinct. The lengths of these elements are 1
for
N
and N.spmd
, S
for N.allspmd
, and
N.spmd
for ID.spmd
.
Author(s)
Wei-Chen Chen wccsnow@gmail.com and George Ostrouchov.
References
Programming with Big Data in R Website: https://pbdr.org/
See Also
Examples
## Not run:
# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r
### Setup environment.
library(pmclust, quiet = TRUE)
comm.set.seed(123)
### Generate an example data.
N.org.spmd <- 5000 + sample(1:1000, 1)
ret.spmd <- assign.N.sample(total.sample = 5000, N.org.spmd)
cat("Rank:", comm.rank(), " Size:", ret.spmd$N.spmd,
"\n", sep = "")
### Quit.
finalize()
## End(Not run)