Larger-than-RAM Disk-Based Data Manipulation Framework


[Up] [Top]

Documentation for package ‘disk.frame’ version 0.8.3

Help Pages

A B C D E F G H I L M N O P Q R S T V W Z misc

-- A --

add_chunk Add a chunk to the disk.frame
all_df.chunk_agg.disk.frame One Stage function
all_df.collected_agg.disk.frame One Stage function
anti_join.disk.frame Performs join/merge for disk.frames
any_df.chunk_agg.disk.frame One Stage function
any_df.collected_agg.disk.frame One Stage function
arrange.disk.frame The dplyr verbs implemented for disk.frame
as.data.frame.disk.frame Convert disk.frame to data.frame by collecting all chunks
as.data.table.disk.frame Convert disk.frame to data.table by collecting all chunks
as.disk.frame Make a data.frame into a disk.frame

-- B --

bind_rows.disk.frame Bind rows

-- C --

ceremony_text Show the code to setup disk.frame
chunk_arrange The dplyr verbs implemented for disk.frame
chunk_distinct The dplyr verbs implemented for disk.frame
chunk_group_by #' @export #' @importFrom dplyr add_count #' @rdname dplyr_verbs add_count.disk.frame <- create_chunk_mapper(dplyr::add_count) #' @export #' @importFrom dplyr add_tally #' @rdname dplyr_verbs add_tally.disk.frame <- create_chunk_mapper(dplyr::add_tally)
chunk_summarise #' @export #' @importFrom dplyr add_count #' @rdname dplyr_verbs add_count.disk.frame <- create_chunk_mapper(dplyr::add_count) #' @export #' @importFrom dplyr add_tally #' @rdname dplyr_verbs add_tally.disk.frame <- create_chunk_mapper(dplyr::add_tally)
chunk_summarize #' @export #' @importFrom dplyr add_count #' @rdname dplyr_verbs add_count.disk.frame <- create_chunk_mapper(dplyr::add_count) #' @export #' @importFrom dplyr add_tally #' @rdname dplyr_verbs add_tally.disk.frame <- create_chunk_mapper(dplyr::add_tally)
chunk_ungroup #' @export #' @importFrom dplyr add_count #' @rdname dplyr_verbs add_count.disk.frame <- create_chunk_mapper(dplyr::add_count) #' @export #' @importFrom dplyr add_tally #' @rdname dplyr_verbs add_tally.disk.frame <- create_chunk_mapper(dplyr::add_tally)
cimap Apply the same function to all chunks
cimap.disk.frame Apply the same function to all chunks
cimap_dfr Apply the same function to all chunks
cimap_dfr.disk.frame Apply the same function to all chunks
clapply Apply the same function to all chunks
cmap Apply the same function to all chunks
cmap.disk.frame Apply the same function to all chunks
cmap2 'cmap2' a function to two disk.frames
cmap_dfr Apply the same function to all chunks
cmap_dfr.disk.frame Apply the same function to all chunks
collect.disk.frame Bring the disk.frame into R
collect.summarized_disk.frame Bring the disk.frame into R
collect_list Bring the disk.frame into R
colnames Return the column names of the disk.frame
colnames.default Return the column names of the disk.frame
colnames.disk.frame Return the column names of the disk.frame
compute.disk.frame Force computations. The results are stored in a folder.
copy_df_to Move or copy a disk.frame to another location
create_chunk_mapper Create function that applies to each chunk if disk.frame
csv_to_disk.frame Convert CSV file(s) to disk.frame format

-- D --

delayed Apply the same function to all chunks
delete Delete a disk.frame
df_ram_size Get the size of RAM in gigabytes
disk.frame Create a disk.frame from a folder
disk.frame_to_parquet A function to convert a disk.frame to parquet format
distinct.disk.frame The dplyr verbs implemented for disk.frame
distribute Shard a data.frame/data.table or disk.frame into chunk and saves it into a disk.frame

-- E --

evalparseglue Helper function to evalparse some 'glue::glue' string

-- F --

filter.disk.frame The dplyr verbs implemented for disk.frame
find_globals_recursively Find globals in an expression by searching through the chain
foverlaps.disk.frame Apply data.table's foverlaps to the disk.frame
full_join.disk.frame Performs join/merge for disk.frames

-- G --

gen_datatable_synthetic Generate synthetic dataset for testing
get_chunk Obtain one chunk by chunk id
get_chunk.disk.frame Obtain one chunk by chunk id
get_chunk_ids Get the chunk IDs and files names
get_partition_paths Get the partitioning structure of a folder
glimpse.disk.frame The dplyr verbs implemented for disk.frame
groups.disk.frame The shard keys of the disk.frame
group_by.disk.frame A function to parse the summarize function
group_vars.disk.frame Column names for RStudio auto-complete

-- H --

head.disk.frame Head and tail of the disk.frame

-- I --

inner_join.disk.frame Performs join/merge for disk.frames
insert_ceremony Show the code to setup disk.frame
IQR_df.chunk_agg.disk.frame One Stage function
IQR_df.collected_agg.disk.frame One Stage function
is_disk.frame Checks if a folder is a disk.frame

-- L --

lazy Apply the same function to all chunks
lazy.disk.frame Apply the same function to all chunks
left_join.disk.frame Performs join/merge for disk.frames
length_df.chunk_agg.disk.frame One Stage function
length_df.collected_agg.disk.frame One Stage function

-- M --

map_by_chunk_id 'cmap2' a function to two disk.frames
max_df.chunk_agg.disk.frame One Stage function
max_df.collected_agg.disk.frame One Stage function
mean_df.chunk_agg.disk.frame One Stage function
mean_df.collected_agg.disk.frame One Stage function
median_df.chunk_agg.disk.frame One Stage function
median_df.collected_agg.disk.frame One Stage function
merge.disk.frame Merge function for disk.frames
min_df.chunk_agg.disk.frame One Stage function
min_df.collected_agg.disk.frame One Stage function
move_to Move or copy a disk.frame to another location
mutate.disk.frame The dplyr verbs implemented for disk.frame

-- N --

names.disk.frame Return the column names of the disk.frame
nchunk Returns the number of chunks in a disk.frame
nchunk.disk.frame Returns the number of chunks in a disk.frame
nchunks Returns the number of chunks in a disk.frame
nchunks.disk.frame Returns the number of chunks in a disk.frame
ncol Number of rows or columns
ncol.disk.frame Number of rows or columns
nrow Number of rows or columns
nrow.disk.frame Number of rows or columns
n_df.chunk_agg.disk.frame One Stage function
n_df.collected_agg.disk.frame One Stage function
n_distinct_df.chunk_agg.disk.frame One Stage function
n_distinct_df.collected_agg.disk.frame One Stage function

-- O --

output_disk.frame Write disk.frame to disk
overwrite_check Check if the outdir exists or not

-- P --

partition_filter Filter the dataset based on folder partitions
play Play the recorded lazy operations
print.disk.frame Print disk.frame
pull.disk.frame Pull a column from table similar to 'dplyr::pull'.
purrr_as_mapper Used to convert a function to purrr syntax if needed

-- Q --

quantile_df.chunk_agg.disk.frame One Stage function
quantile_df.collected_agg.disk.frame One Stage function

-- R --

rbindlist.disk.frame rbindlist disk.frames together
rechunk Increase or decrease the number of chunks in the disk.frame
recommend_nchunks Recommend number of chunks based on input size
remove_chunk Removes a chunk from the disk.frame
rename.disk.frame The dplyr verbs implemented for disk.frame

-- S --

sample_frac.disk.frame Sample n rows from a disk.frame
sd_df.chunk_agg.disk.frame One Stage function
sd_df.collected_agg.disk.frame One Stage function
select.disk.frame The dplyr verbs implemented for disk.frame
semi_join.disk.frame Performs join/merge for disk.frames
setup_disk.frame Set up disk.frame environment
shard Shard a data.frame/data.table or disk.frame into chunk and saves it into a disk.frame
shardkey Returns the shardkey (not implemented yet)
shardkey_equal Compare two disk.frame shardkeys
show_boilerplate Show the code to setup disk.frame
show_ceremony Show the code to setup disk.frame
split_string_into_df Turn a string of the form /partion1=val/partion2=val2 into data.frame
srckeep Keep only the variables from the input listed in selections
summarise.disk.frame A function to parse the summarize function
summarise.grouped_disk.frame A function to parse the summarize function
summarize.disk.frame A function to parse the summarize function
summarize.grouped_disk.frame A function to parse the summarize function
sum_df.chunk_agg.disk.frame One Stage function
sum_df.collected_agg.disk.frame One Stage function

-- T --

tail.disk.frame Head and tail of the disk.frame
tbl_vars.disk.frame Column names for RStudio auto-complete
transmute.disk.frame The dplyr verbs implemented for disk.frame

-- V --

var_df.chunk_agg.disk.frame One Stage function
var_df.collected_agg.disk.frame One Stage function

-- W --

write_disk.frame Write disk.frame to disk

-- Z --

zip_to_disk.frame 'zip_to_disk.frame' is used to read and convert every CSV file within the zip file to disk.frame format

-- misc --

[[.disk.frame [[ interface for disk.frame using fst backend