big.matrix {bigmemory} | R Documentation |
The core "big.matrix" operations.
Description
Create a big.matrix
(or check to see if an object
is a big.matrix
, or create a big.matrix
from a
matrix
, and so on). The big.matrix
may be file-backed.
Usage
big.matrix(
nrow,
ncol,
type = options()$bigmemory.default.type,
init = NULL,
dimnames = NULL,
separated = FALSE,
backingfile = NULL,
backingpath = NULL,
descriptorfile = NULL,
binarydescriptor = FALSE,
shared = options()$bigmemory.default.shared
)
filebacked.big.matrix(
nrow,
ncol,
type = options()$bigmemory.default.type,
init = NULL,
dimnames = NULL,
separated = FALSE,
backingfile = NULL,
backingpath = NULL,
descriptorfile = NULL,
binarydescriptor = FALSE
)
as.big.matrix(
x,
type = NULL,
separated = FALSE,
backingfile = NULL,
backingpath = NULL,
descriptorfile = NULL,
binarydescriptor = FALSE,
shared = options()$bigmemory.default.shared
)
is.big.matrix(x)
## S4 method for signature 'big.matrix'
is.big.matrix(x)
## S4 method for signature 'ANY'
is.big.matrix(x)
is.separated(x)
## S4 method for signature 'big.matrix'
is.separated(x)
is.filebacked(x)
## S4 method for signature 'big.matrix'
is.filebacked(x)
shared.name(x)
## S4 method for signature 'big.matrix'
shared.name(x)
file.name(x)
## S4 method for signature 'big.matrix'
file.name(x)
dir.name(x)
## S4 method for signature 'big.matrix'
dir.name(x)
is.shared(x)
## S4 method for signature 'big.matrix'
is.shared(x)
is.readonly(x)
## S4 method for signature 'big.matrix'
is.readonly(x)
is.nil(address)
Arguments
nrow |
number of rows. |
ncol |
number of columns. |
type |
the type of the atomic element
( |
init |
a scalar value for initializing the matrix ( |
dimnames |
a list of the row and column names; use with caution for large objects. |
separated |
use separated column organization of the data; see details. |
backingfile |
the root name for the file(s) for the cache of |
backingpath |
the path to the directory containing the file backing cache. |
descriptorfile |
the name of the file to hold the backingfile
description, for subsequent use with |
binarydescriptor |
the flag to specify if the binary RDS format
should be used for the backingfile description, for subsequent use with
|
shared |
|
x |
a |
address |
an |
Details
A big.matrix
consists of an object in R that does nothing
more than point to the data structure implemented in C++. The
object acts much like a traditional R matrix, but helps protect the user
from many inadvertent memory-consuming pitfalls of traditional R matrices
and data frames.
There are two big.matrix
types which manage
data in different ways. A standard, shared big.matrix
is constrained
to available RAM, and may be shared across separate R processes.
A file-backed big.matrix
may exceed available RAM by
using hard drive space, and may also be shared across processes. The
atomic types of these matrices may be double
, integer
,
short
, or char
(8, 4, 2, and 1 bytes, respectively).
If x
is a big.matrix
, then x[1:5,]
is returned as an R
matrix
containing the first five rows of x
. If x
is of
type double
, then the result will be numeric
; otherwise, the
result will be an integer
R matrix. The expression x
alone
will display information about the R object (e.g. the external pointer)
rather than evaluating the matrix itself (the user should try x[,]
with extreme caution, recognizing that a huge R matrix
will
be created).
If x
has a huge number of rows and/or columns, then the use of
rownames
and/or colnames
will be extremely memory-intensive
and should be avoided. If x
has a huge number of columns and
separated=TRUE
is used (this isn't typically recommended),
the user might want to store the transpose as there is overhead of a
pointer for each column in the matrix. If separated
is TRUE
,
then the memory is allocated into separate vectors for each column.
Use this option with caution if you have a large number of columns, as
shared-memory segments are limited by OS and hardware combinations. If
separated
is FALSE
, the matrix is stored in traditional
column-major format. The function is.separated()
returns the
separation type of the big.matrix
.
When a big.matrix
, x
, is passed as an argument
to a function, it is essentially providing call-by-reference rather than
call-by-value behavior. If the function modifies any of the values of
x
, the changes are not limited in scope to a local copy within the
function. This introduces the possibility of side-effects, in contrast to
standard R behavior.
A file-backed big.matrix
may exceed available RAM in size
by using a file cache (or possibly multiple file caches, if
separated=TRUE
). This can incur a substantial performance penalty for
such large matrices, but less of a penalty than most other approaches for
handling such large objects. A side-effect of creating a file-backed object
is not only the file-backing(s), but a descriptor file (in the same
directory) that is needed for subsequent attachments (see
attach.big.matrix
).
Note that we do not allow setting or changing the dimnames
attributes
by default; such changes would not be reflected in the descriptor objects or
in shared memory. To override this, set
options(bigmemory.allow.dimnames=TRUE)
.
It should also be noted that a user can create an “anonymous” file-backed
big.matrix
by specifying "" as the filebacking
argument.
In this case, the backing resides in the temporary directory and a
descriptor file is not created. These should be used with caution since
even anonymous backings use disk space which could eventually fill the
hard drive. Anonymous backings are removed either manually, by a
user, or automatically, when the operating system deems it appropriate.
Finally, note that as.big.matrix
can coerce data frames. It does
this by making any character columns into factors, and then making all
factors numeric before forming the big.matrix
. Level labels are
not preserved and must be managed by the user if desired.
Value
A big.matrix
is returned (for big.matrix
and
filebacked.big.matrix
, and
as.big.matrix
),
and TRUE
or FALSE
for is.big.matrix
and the
other functions.
Author(s)
John W. Emerson and Michael J. Kane bigmemoryauthors@gmail.com
References
The Bigmemory Project: http://www.bigmemory.org/.
See Also
bigmemory
, and perhaps the class documentation of
big.matrix
; attach.big.matrix
and
describe
. Sister packages biganalytics, bigtabulate,
synchronicity, and bigalgebra provide advanced functionality.
Examples
x <- big.matrix(10, 2, type='integer', init=-5)
options(bigmemory.allow.dimnames=TRUE)
colnames(x) <- c("alpha", "beta")
is.big.matrix(x)
dim(x)
colnames(x)
rownames(x)
x[,]
x[1:8,1] <- 11:18
colnames(x) <- NULL
x[,]
# The following shared memory example is quite silly, as you wouldn't
# likely do this in a single R session. But if zdescription were
# passed to another R session via SNOW, foreach, or even by a
# simple file read/write, then the attach.big.matrix() within the
# second R process would give access to the same object in memory.
# Please see the package vignette for real examples.
z <- big.matrix(3, 3, type='integer', init=3)
z[,]
dim(z)
z[1,1] <- 2
z[,]
zdescription <- describe(z)
zdescription
y <- attach.big.matrix(zdescription)
y[,]
y
z
y[1,1] <- -100
y[,]
z[,]