mvl_write_extent_index {RMVL} | R Documentation |
Compute and write extent index
Description
This function computes a hash-based index that allows to find indices of rows which hashes match query values. While it can be applied to arbitrary data, it is optimized for the common case when vectors contain stretches of repeated values describing row groups to be processed. This is particularly relevant for R because vectorized processing of row batches is the only practical way to scan very large tables using pure-R code.
Usage
mvl_write_extent_index(MVLHANDLE, L, name = NULL)
Arguments
MVLHANDLE |
a handle to MVL file produced by mvl_open() |
L |
list of vector like MVL_OBJECTs |
name |
if specified add a named entry to MVL file directory |
Details
mvl_write_extent_index()
creates the index in memory and then writes it out. The memory usage is proportional to the number of
repeat stretches. Sorting tables improves performance, but is not a requirement.
Value
an object of class MVL_OFFSET that describes an offset into this MVL file. MVL offsets are vectors and can be concatenated. They can be written to MVL file directly, or as part of another object such as list.
See Also
mvl_order_vectors
, mvl_index_lapply
, mvl_find_matches
, mvl_group
, mvl_find_matches
, mvl_indexed_copy
, mvl_merge
, mvl_hash_vectors
, mvl_get_groups
Examples
## Not run:
Mtmp<-mvl_open("tmp_a.mvl", append=TRUE, create=TRUE)
mvl_write_object(Mtmp, data.frame(x=runif(100), y=(1:100) %% 10), "df1")
Mtmp<-mvl_remap(Mtmp)
mvl_write_extent_index(Mtmp, list(Mtmp$df1[,"y",ref=TRUE]), "df1_extent_index_y")
Mtmp<-mvl_remap(Mtmp)
mvl_index_lapply(Mtmp["df1_extent_index_y", ref=TRUE], list(c(2, 3)),
function(i, idx) { return(list(i, idx))})
# Example of full scan
mvl_index_lapply(Mtmp["df1_extent_index_y", ref=TRUE], ,
function(i, idx) { return(list(i, idx))})
## End(Not run)