AbstractFileArray {R.huge} | R Documentation |
Class representing a persistent array stored in a file
Description
Package: R.huge
Class AbstractFileArray
Object
~~|
~~+--
AbstractFileArray
Directly known subclasses:
FileByteMatrix, FileByteVector, FileDoubleMatrix, FileDoubleVector, FileFloatMatrix, FileFloatVector, FileIntegerMatrix, FileIntegerVector, FileMatrix, FileShortMatrix, FileShortVector, FileVector
public static class AbstractFileArray
extends Object
Note that this is an abstract class, i.e. it is not possible to create
an object of this class but only from one of its subclasses.
For a vector data type, see FileVector
.
For a matrix data type, see FileMatrix
.
Usage
AbstractFileArray(filename=NULL, path=NULL, storageMode=c("integer", "double"),
bytesPerCell=1, dim=NULL, dimnames=NULL, dimOrder=NULL, comments=NULL,
nbrOfFreeBytes=4096)
Arguments
filename |
The name of the file storing the data. |
path |
An optional path where data should be stored. |
storageMode |
The storage |
bytesPerCell |
The number of bytes each element (cell) takes up
on the file system. If |
dim |
|
dimnames |
An optional |
dimOrder |
The order of the dimensions. |
comments |
An optional |
nbrOfFreeBytes |
The number of "spare" bytes after the comments before the data section begins. |
Details
The purpose of this class is to be able to work with large arrays in R without being limited by the amount of memory available. Data is kept on the file system and elements are read and written whenever queried.
Fields and Methods
Methods:
as.character | Returns a short string describing the file array. | |
as.vector | Returns the elements of a file array as an R vector. | |
clone | Clones a file array. | |
close | Closes a connection to the data file of the file array. | |
delete | Deletes the file array from the file system. | |
dim | Gets the dimension of the file array. | |
dimnames | Gets the dimension names of a file array. | |
finalize | Internal: Clean up when file array is deallocated from memory. | |
flush | Internal: Flushes the write buffer. | |
getBasename | Gets the basename (filename) of the data file. | |
getBytesPerCell | Gets the number of bytes per element in a file array. | |
getCloneNumber | Gets the clone number of the file array. | |
getComments | Gets the comments for this file array. | |
getDataOffset | Gets file position of the data section in a file array. | |
getDimensionOrder | Gets the order of dimension. | |
getExtension | Gets the filename extension of the file array. | |
getFileSize | Gets the size of the file array. | |
getName | Gets the name of the file array. | |
getPath | Gets the path (directory) where the data file lives. | |
getPathname | Gets the full pathname to the data file. | |
getSizeOfComments | Gets the number of bytes the comments occupies. | |
getSizeOfData | Gets the size of the data section in bytes. | |
getStorageMode | Gets the storage mode of the file array. | |
isOpen | Checks whether the data file of the file array is open or not. | |
length | Gets the number of elements in a file array. | |
open | Opens a connection to the data file of the file array. | |
readAllValues | Reads all values in the file array. | |
readContiguousValues | Reads sets of contiguous values in the file array. | |
readHeader | Read the header of a file array data file. | |
readValues | Reads individual values in the file array. | |
setComments | Sets the comments for this file array. | |
writeAllValues | Writes all values to a file array. | |
writeEmptyData | Writes an empty data section to the data file of a file array. | |
writeHeader | Writes the header of a file array to file. | |
writeHeaderComments | - | |
writeValues | Writes values to a file array. | |
Methods inherited from Object:
$, $<-, [[, [[<-, as.character, attach, attachLocally, clearCache, clearLookupCache, clone, detach, equals, extend, finalize, getEnvironment, getFieldModifier, getFieldModifiers, getFields, getInstantiationTime, getStaticInstance, hasField, hashCode, ll, load, names, objectSize, print, save
Maximum number of elements
It is only the header that is kept in memory, not the data, and therefore the maximum length of a array that can be allocate, is limited by the amount of available space on the file system. Since element names (optional) are stored in the header, these may also be a limiting factor.
Element names
The element names are stored in the header and are currently read and written to file one by one. This may slow down the performance substantially if the dimensions are large. For optimal opening performance, avoid names.
For now, do not change names after file has been allocated.
File format
The file format consist of a header section and a data section.
The header contains information about the file format, the length
and element names of the array, as well as data type
(storage mode
()), the size of each element.
The data section, which follows immediately after the header section,
consists of all data elements with non-assigned elements being
pre-allocated with zeros.
For more details, see the source code.
Limitations
The size of the array in bytes is limited by the maximum file size of the file system. For instance, the maximum file size on a Windows FAT32 system is 4GB (2GB?). On Windows NTFS the limit is in practice ~16TB.
Author(s)
Henrik Bengtsson
References
[1] New Technology File System (NTFS), Wikipedia, 2006 https://en.wikipedia.org/wiki/NTFS.