AbstractFileArray {R.huge}R Documentation

Class representing a persistent array stored in a file

Description

Package: R.huge
Class AbstractFileArray

Object
~~|
~~+--AbstractFileArray

Directly known subclasses:
FileByteMatrix, FileByteVector, FileDoubleMatrix, FileDoubleVector, FileFloatMatrix, FileFloatVector, FileIntegerMatrix, FileIntegerVector, FileMatrix, FileShortMatrix, FileShortVector, FileVector

public static class AbstractFileArray
extends Object

Note that this is an abstract class, i.e. it is not possible to create an object of this class but only from one of its subclasses. For a vector data type, see FileVector. For a matrix data type, see FileMatrix.

Usage

AbstractFileArray(filename=NULL, path=NULL, storageMode=c("integer", "double"),
  bytesPerCell=1, dim=NULL, dimnames=NULL, dimOrder=NULL, comments=NULL,
  nbrOfFreeBytes=4096)

Arguments

filename

The name of the file storing the data.

path

An optional path where data should be stored.

storageMode

The storage mode() of the data elements.

bytesPerCell

The number of bytes each element (cell) takes up on the file system. If NULL, it is inferred from the storageMode argument.

dim

A numeric vector specifying the dimensions of the array.

dimnames

An optional list of dimension names.

dimOrder

The order of the dimensions.

comments

An optional character string of arbitrary length.

nbrOfFreeBytes

The number of "spare" bytes after the comments before the data section begins.

Details

The purpose of this class is to be able to work with large arrays in R without being limited by the amount of memory available. Data is kept on the file system and elements are read and written whenever queried.

Fields and Methods

Methods:

as.character Returns a short string describing the file array.
as.vector Returns the elements of a file array as an R vector.
clone Clones a file array.
close Closes a connection to the data file of the file array.
delete Deletes the file array from the file system.
dim Gets the dimension of the file array.
dimnames Gets the dimension names of a file array.
finalize Internal: Clean up when file array is deallocated from memory.
flush Internal: Flushes the write buffer.
getBasename Gets the basename (filename) of the data file.
getBytesPerCell Gets the number of bytes per element in a file array.
getCloneNumber Gets the clone number of the file array.
getComments Gets the comments for this file array.
getDataOffset Gets file position of the data section in a file array.
getDimensionOrder Gets the order of dimension.
getExtension Gets the filename extension of the file array.
getFileSize Gets the size of the file array.
getName Gets the name of the file array.
getPath Gets the path (directory) where the data file lives.
getPathname Gets the full pathname to the data file.
getSizeOfComments Gets the number of bytes the comments occupies.
getSizeOfData Gets the size of the data section in bytes.
getStorageMode Gets the storage mode of the file array.
isOpen Checks whether the data file of the file array is open or not.
length Gets the number of elements in a file array.
open Opens a connection to the data file of the file array.
readAllValues Reads all values in the file array.
readContiguousValues Reads sets of contiguous values in the file array.
readHeader Read the header of a file array data file.
readValues Reads individual values in the file array.
setComments Sets the comments for this file array.
writeAllValues Writes all values to a file array.
writeEmptyData Writes an empty data section to the data file of a file array.
writeHeader Writes the header of a file array to file.
writeHeaderComments -
writeValues Writes values to a file array.

Methods inherited from Object:
$, $<-, [[, [[<-, as.character, attach, attachLocally, clearCache, clearLookupCache, clone, detach, equals, extend, finalize, getEnvironment, getFieldModifier, getFieldModifiers, getFields, getInstantiationTime, getStaticInstance, hasField, hashCode, ll, load, names, objectSize, print, save

Maximum number of elements

It is only the header that is kept in memory, not the data, and therefore the maximum length of a array that can be allocate, is limited by the amount of available space on the file system. Since element names (optional) are stored in the header, these may also be a limiting factor.

Element names

The element names are stored in the header and are currently read and written to file one by one. This may slow down the performance substantially if the dimensions are large. For optimal opening performance, avoid names.

For now, do not change names after file has been allocated.

File format

The file format consist of a header section and a data section. The header contains information about the file format, the length and element names of the array, as well as data type (storage mode()), the size of each element. The data section, which follows immediately after the header section, consists of all data elements with non-assigned elements being pre-allocated with zeros.

For more details, see the source code.

Limitations

The size of the array in bytes is limited by the maximum file size of the file system. For instance, the maximum file size on a Windows FAT32 system is 4GB (2GB?). On Windows NTFS the limit is in practice ~16TB.

Author(s)

Henrik Bengtsson

References

[1] New Technology File System (NTFS), Wikipedia, 2006 https://en.wikipedia.org/wiki/NTFS.


[Package R.huge version 0.10.1 Index]