assign.N.sample {pmclust}R Documentation

Obtain a Set of Random Samples for X.spmd

Description

This utility function samples data randomly from X.spmd to form a relatively small subset of original data. The EM algorithm on the smaller subset is topically performing fast and capturing rough structures of entire dataset.

Usage

  assign.N.sample(total.sample = 5000, N.org.spmd)

Arguments

total.sample

a total number of samples which will be selected from the original data X.spmd.

N.org.spmd

the original data size, i.e. nrow(X.spmd).

Details

This utility function performs simple random sampling without replacement for the original dataset X.spmd. Different random seeds should be set before calling this function.

Value

A list variable will be returned and containing:

N total sample size across all S processors
N.spmd sample size of given processor
N.allspmds a collection of sample sizes for all S processors
ID.spmd index of selected samples ranged from 1 to N.org.spmd

Note that N and N.allspmds are the same across all S processors, but N.spmd and ID.spmd are most likely all distinct. The lengths of these elements are 1 for N and N.spmd, S for N.allspmd, and N.spmd for ID.spmd.

Author(s)

Wei-Chen Chen wccsnow@gmail.com and George Ostrouchov.

References

Programming with Big Data in R Website: https://pbdr.org/

See Also

set.global

Examples

## Not run: 
# Save code in a file "demo.r" and run in 4 processors by
# > mpiexec -np 4 Rscript demo.r

### Setup environment.
library(pmclust, quiet = TRUE)
comm.set.seed(123)

### Generate an example data.
N.org.spmd <- 5000 + sample(1:1000, 1)
ret.spmd <- assign.N.sample(total.sample = 5000, N.org.spmd)
cat("Rank:", comm.rank(), " Size:", ret.spmd$N.spmd,
    "\n", sep = "")

### Quit.
finalize()

## End(Not run)

[Package pmclust version 0.2-1 Index]