bigmemory-package {bigmemory}R Documentation

Manage massive matrices with shared memory and memory-mapped files.

Description

Create, store, access, and manipulate massive matrices. Matrices are, by default, allocated to shared memory and may use memory-mapped files. Packages biganalytics, synchronicity, bigalgebra, and bigtabulate provide advanced functionality. Access to and manipulation of a big.matrix object is exposed in an S4 class whose interface is similar to that of a matrix. Use of these packages in parallel environments can provide substantial speed and memory efficiencies. bigmemory also provides a C++ framework for the development of new tools that can work both with big.matrix and native matrix objects.

Details

Index of functions/methods (grouped in a friendly way):

big.matrix, filebacked.big.matrix, as.big.matrix

is.big.matrix, is.separated, is.filebacked

describe, attach.big.matrix, attach.resource

sub.big.matrix, is.sub.big.matrix

dim, dimnames, nrow, ncol, print, head, tail, typeof, length

read.big.matrix, write.big.matrix

mwhich

morder, mpermute

deepcopy

flush 

Multi-gigabyte data sets challenge and frustrate users, even on well-equipped hardware. Use of C/C++ can provide efficiencies, but is cumbersome for interactive data analysis and lacks the flexibility and power of 's rich statistical programming environment. The package bigmemory and associated packages biganalytics, synchronicity, bigtabulate, and bigalgebra bridge this gap, implementing massive matrices and supporting their manipulation and exploration. The data structures may be allocated to shared memory, allowing separate processes on the same computer to share access to a single copy of the data set. The data structures may also be file-backed, allowing users to easily manage and analyze data sets larger than available RAM and share them across nodes of a cluster. These features of the Bigmemory Project open the door for powerful and memory-efficient parallel analyses and data mining of massive data sets.

This project (bigmemory and its sister packages) is still actively developed, although the design and current features can be viewed as "stable." Please feel free to email us with any questions: bigmemoryauthors@gmail.com.

Memory considerations

For obvious reasons memory that the big.matrix uses is managed outside the R memory pool available to the garbage collector and the memory occupied by the big.matrix is not visible to the R. This has subtle implications:

Note

Various options are available. options(bigmemory.typecast.warning) can be set to avoid annoying warnings that might occur if, for example, you assign objects (typically type double) to char, short, or integer big.matrix objects. options(bigmemory.print.warning) protects against extracting and printing a massive matrix (which would involve the creation of a second massive copy of the matrix). options(bigmemory.allow.dimnames) by default prevents the setting of dimnames attributes, because they aren't allocated to shared memory and changes will not be visible across processes. options(bigmemory.default.type) is "double" be default (a change in default behavior as of 4.1.1) but may be changed by the user.

Note that you can't simply use a big.matrix with many (most) existing functions (e.g. lm, kmeans). One nice exception is split, because this function only accesses subsets of the matrix.

Author(s)

Michael J. Kane, John W. Emerson, Peter Haverty, and Charles Determan Jr.

Maintainers: Michael J. Kane bigmemoryauthors@gmail.com

See Also

For example, big.matrix, mwhich, read.big.matrix

Examples



# Our examples are all trivial in size, rather than burning huge amounts
# of memory.

x <- big.matrix(5, 2, type="integer", init=0,
                dimnames=list(NULL, c("alpha", "beta")))
x
x[1:2,]
x[,1] <- 1:5
x[,"alpha"]
colnames(x)
options(bigmemory.allow.dimnames=TRUE)
colnames(x) <- NULL
x[,]



[Package bigmemory version 4.6.4 Index]