| ffply {fplyr} | R Documentation | 
Read, process each block and write the result
Description
Suppose you want to process each block of a file and the result is again
a data.table that you want to print to some output file. One possible
approach is to use l <- flply(...) followed by do.call(rbind, l)
and fwrite, but this would be slow. ffply offers a faster
solution to this problem.
Usage
ffply(
  input,
  output = "",
  FUN,
  ...,
  key.sep = "\t",
  sep = "\t",
  skip = 0,
  header = TRUE,
  nblocks = Inf,
  stringsAsFactors = FALSE,
  colClasses = NULL,
  select = NULL,
  drop = NULL,
  col.names = NULL,
  parallel = 1
)
Arguments
input | 
 Path of the input file.  | 
output | 
 String containing the path to the output file.  | 
FUN | 
 Function to be applied to each block. It must take at least two arguments,
the first of which is a   | 
... | 
 Additional arguments to be passed to FUN.  | 
key.sep | 
 The character that delimits the first field from the rest.  | 
sep | 
 The field delimiter (often equal to   | 
skip | 
 Number of lines to skip at the beginning of the file  | 
header | 
 Whether the file has a header.  | 
nblocks | 
 The number of blocks to read.  | 
stringsAsFactors | 
 Whether to convert strings into factors.  | 
colClasses | 
 Vector or list specifying the class of each field.  | 
select | 
 The columns (names or numbers) to be read.  | 
drop | 
 The columns (names or numbers) not to be read.  | 
col.names | 
 Names of the columns.  | 
parallel | 
 Number of cores to use.  | 
Value
Returns NULL invisibly. As a side effect,
writes the processed data.table to the output file.
Slogan
ffply: from file to file
Examples
f1 <- system.file("extdata", "dt_iris.csv", package = "fplyr")
f2 <- tempfile()
# Copy the first two blocks from f1 into f2 to obtain a shorter but
# consistent version of the original input file.
ffply(f1, f2, function(d, by) {return(d)}, nblocks = 2)
# Compute the mean of the columns for each species
ffply(f1, f2, function(d, by) d[, lapply(.SD, mean)])
# Reshape the file, block by block
ffply(f1, f2, function(d, by) {
    val <- do.call(c, d)
    var <- rep(names(d), each = nrow(d))
    data.table(Var = var, Val = val)
})