create_lazyarray {lazyarray}R Documentation

Create a lazy-array with given format and dimension

Description

Create a directory to store lazy-array. The path must be missing. See load_lazyarray for more details

Usage

create_lazyarray(
  path,
  storage_format,
  dim,
  dimnames = NULL,
  compress_level = 50L,
  prefix = "",
  multipart = TRUE,
  multipart_mode = 1,
  file_names = NULL,
  meta_name = "lazyarray.meta"
)

Arguments

path

path to a local drive to store array data

storage_format

data type, choices are "double", "integer", "character", and "complex"

dim

integer vector, dimension of array, see dim

dimnames

list of vectors, names of each dimension, see dimnames

compress_level

0 to 100, level of compression. 0 means no compression, 100 means maximum compression. For persistent data, it's recommended to set 100. Default is 50.

prefix

character prefix of array partition

multipart

whether to split array into multiple partitions, default is true

multipart_mode

1, or 2, mode of partition, see details.

file_names

data file names without prefix/extensions; see details.

meta_name

header file name, default is "lazyarray.meta"

Details

Lazy array stores array into hard drive, and load them on demand. It differs from other packages such as "bigmemory" that the internal reading uses multi-thread, which gains significant speed boost on solid state drives.

One lazy array contains two parts: data file(s) and a meta file. The data files can be stored in two ways: non-partitioned and partitioned.

For non-partitioned data array, the dimension is set at the creation of the array and cannot be mutable once created

For partitioned data array, there are also two partition modes, defined by `multipart_mode`. For mode 1, each partition has the same dimension size as the array. The last dimension is 1. For example, a data with dimension c(2,3,5) partitioned with mode 1 will have each partition dimension stored with c(2,3,1). For mode 2, the last dimension will be dropped when storing each partitions.

file_names is used when irregular partition names should be used. If multipart=FALSE, the whole array is stored in a single file under path. The file name is <prefix><file_name>.fst. For example, by default prefix="", and file_name="", then path/.fst stores the array data. If multipart=TRUE, then file_names should be a character vector of length equal to array's last dimension. A 3x4x5 array has 5 partitions, each partition name follows <prefix><file_name>.fst convention, and one can always use arr$get_partition_fpath() to find location of partition files. For examples, see lazyarray.

Value

A ClassLazyArray instance

Author(s)

Zhengjia Wang


[Package lazyarray version 1.1.0 Index]