lazyarray {lazyarray} | R Documentation |
Create or load 'lazyarray' instance
Description
If path is missing, create a new array. If path exists and meta file is complete, load existing file, otherwise create new meta file and import from existing data.
Usage
lazyarray(
path,
storage_format,
dim,
dimnames = NULL,
multipart = TRUE,
prefix = "",
multipart_mode = 1,
compress_level = 50L,
file_names = list("", seq_len(dim[[length(dim)]]))[[multipart + 1]],
meta_name = "lazyarray.meta",
read_only = FALSE,
quiet = FALSE,
...
)
Arguments
path |
path to a local drive where array data is stored |
storage_format |
data type, choices are |
dim |
integer vector, dimension of array, see |
dimnames |
list of vectors, names of each dimension, see |
multipart |
whether to split array into multiple partitions, default is true |
prefix |
character prefix of array partition |
multipart_mode |
1, or 2, mode of partition, see |
compress_level |
0 to 100, level of compression. 0 means no compression, 100 means maximum compression. For persistent data, it's recommended to set 100. Default is 50. |
file_names |
partition names without prefix nor extension; see details |
meta_name |
header file name, default is |
read_only |
whether created array is read-only |
quiet |
whether to suppress messages, default is false |
... |
ignored |
Details
There are three cases and lazyarray
behaves differently
under each cases. Case 1: if path
is missing, then the function calls
create_lazyarray
to create a blank array instance. Case 2:
if path
exists and it contains meta_name
, then load existing
instance with given read/write access. In this case, parameters other than
read_only
, path
, meta_name
will be ignored. Case 3: if
meta_name
is missing and path
is missing, then lazyarray
will try to create arrays from existing data files.
If lazyarray
enters case 3, then file_names
will be used to
locate partition files. Under multi-part mode (multipart=TRUE
),
file_names
is default to 1, 2, ..., dim[length(dim)]
. These
correspond to '1.fst'
, '2.fst'
, etc. under path
folder.
You may specify your own file_names
if irregular names are used.
and file format for each partition will be <prefix><file_name>.fst
.
For example, a file name file_names=c('A', 'B')
and
prefix="file-"
means the first partition will be stored as
"file-A.fst"
, and "file-B.fst"
. It's fine if some files are
missing, the corresponding partition will be filled with NA
when
trying to obtain values from those partition. However, length of
file_names
must equals to the last dimension when
multipart=TRUE
. If multipart=FALSE
, file_names
should
have length 1 and the corresponding file is the data file.
It's worth note to import from existing partition files generated by
other packages such as 'fst'
, the partition files must be homogeneous,
meaning the stored data length, dimension, and storage type must be the same.
Because 'fstcore'
package stores data in data frame internally,
the column name must be 'V1', 'V2', etc. for non-complex elements or
'V1R', 'V1I', ... for complex numbers (real and imaginary data are stored
in different columns).
Author(s)
Zhengjia Wang
See Also
create_lazyarray
, load_lazyarray
Examples
path <- tempfile()
# ---------------- case 1: Create new array ------------------
arr <- lazyarray(path, storage_format = 'double', dim = c(2,3,4),
meta_name = 'lazyarray.meta')
arr[] <- 1:24
# Subset and get the first partition
arr[,,1]
# Partition file path (total 4 partitions)
arr$get_partition_fpath()
# Removing array doesn't clear the data
rm(arr); gc()
# ---------------- Case 2: Load from existing directory ----------------
## Important!!! Run case 1 first
# Load from existing path, no need to specify other params
arr <- lazyarray(path, meta_name = 'lazyarray.meta', read_only = TRUE)
arr[,,1]
# ---------------- Case 3: Import from existing data ----------------
## Important!!! Run case 1 first
# path exists, but meta is missing, all other params are required
# Notice the partition count increased from 4 to 5, and storage type converts
# from double to character
arr <- lazyarray(path = path, meta_name = 'lazyarray-character.meta',
file_names = c(1,2,3,4,'additional'),
storage_format = 'character', dim = c(2,3,5),
quiet = TRUE, read_only = FALSE)
# partition names
arr$get_partition_fpath(1:4, full_path = FALSE)
arr$get_partition_fpath(5, full_path = FALSE)
# The first dimension still exist and valid
arr[,,1]
# The additional partition is all NA
arr[,,5]
# Set data to 5th partition
arr[,,5] <- rep(0, 6)
# -------- Advanced usage: create fst data and import manually --------
# Clear existing files
path <- tempfile()
unlink(path, recursive = TRUE)
dir.create(path, recursive = TRUE)
# Create array of dimension 2x3x4, but 3rd partition is missing
# without using lazyarray package
# Column names must be V1 or V1R, V1I (complex)
fst::write_fst(data.frame(V1 = 1:6), path = file.path(path, 'part-1.fst'))
fst::write_fst(data.frame(V1 = 7:12), path = file.path(path, 'part-B.fst'))
fst::write_fst(data.frame(V1 = 19:24), path = file.path(path, 'part-d.fst'))
# Import via lazyarray
arr <- lazyarray(path, meta_name = 'test-int.meta',
storage_format = 'integer',
dim = c(2,3,4), prefix = 'part-',
file_names = c('1', 'B', 'C', 'd'),
quiet = TRUE)
arr[]
# Complex case
fst::write_fst(data.frame(V1R = 1:6, V1I = 1:6),
path = file.path(path, 'cplx-1.fst'))
fst::write_fst(data.frame(V1R = 7:12, V1I = 100:105),
path = file.path(path, 'cplx-2.fst'))
fst::write_fst(data.frame(V1R = 19:24, V1I = rep(0,6)),
path = file.path(path, 'cplx-4.fst'))
arr <- lazyarray(path, meta_name = 'test-cplx.meta',
storage_format = 'complex',
dim = c(2,3,4), prefix = 'cplx-',
file_names = 1:4, quiet = TRUE)
arr[]