distcomp {distcomp} | R Documentation |

## Distributed Computing with R

### Description

`distcomp`

is a collection of methods to fit models to data that may be
distributed at various sites. The package arose as a way of addressing the
issues regarding data aggregation; by allowing sites to have control over
local data and transmitting only summaries, some privacy controls can be
maintained. Even when participants have no objections in principle to data
aggregation, it may still be useful to keep data local and expose just the
computations. For further details, please see the reference cited below.

### Details

The initial implementation consists of a stratified Cox model fit with distributed survival data and a Singular Value Decomposition of a distributed matrix. General Linear Models will soon be added. Although some sanity checks and balances are present, many more are needed to make this truly robust. We also hope that other methods will be added by users.

We make the following assumptions in the implementation:
(a) the aggregate data is logically a stacking of data at each site, i.e.,
the full data is row-partitioned into sites where the rows are observations;
(b) Each site has the package `distcomp`

installed and a workspace setup
for (writeable) use by the `opencpu`

server
(see `distcompSetup()`

; and (c) each site is exposing `distcomp`

via an `opencpu`

server.

The main computation happens via a master process, a script of R code,
that makes calls to `distcomp`

functions at worker sites via `opencpu`

.
The use of `opencpu`

allows developers to prototype their distributed implementations
on a local machine using the `opencpu`

package that runs such a server locally
using `localhost`

ports.

Note that `distcomp`

computations are not intended for speed/efficiency;
indeed, they are orders of magnitude slower. However, the models that are fit are
not meant to be recomputed often. These and other details are discussed in the
paper mentioned above.

The current implementation, particularly the Stratified Cox Model, makes direct use of
code from `survival::coxph()`

. That is, the underlying Cox model code is
derived from that in the R `survival`

survival package.

For an understanding of how this package is meant to be used, please see the documented examples and the reference.

### References

Software for Distributed Computation on Medical Databases: A Demonstration Project. Journal of Statistical Software, 77(13), 1-22. doi:10.18637/jss.v077.i13

Appendix E of Modeling Survival Data: Extending the Cox Model by Terry M. Therneau and Patricia Grambsch. Springer Verlag, 2000.

### See Also

The examples in `system.file("doc", "examples.html", package="distcomp")`

The source for the examples: `system.file("doc_src", "examples.Rmd", package="distcomp")`

.

*distcomp*version 1.3-3 Index]