| ffply {fplyr} | R Documentation | 
Read, process each block and write the result
Description
Suppose you want to process each block of a file and the result is again
a data.table that you want to print to some output file. One possible
approach is to use l <- flply(...) followed by do.call(rbind, l)
and fwrite, but this would be slow. ffply offers a faster
solution to this problem.
Usage
ffply(
  input,
  output = "",
  FUN,
  ...,
  key.sep = "\t",
  sep = "\t",
  skip = 0,
  header = TRUE,
  nblocks = Inf,
  stringsAsFactors = FALSE,
  colClasses = NULL,
  select = NULL,
  drop = NULL,
  col.names = NULL,
  parallel = 1
)
Arguments
| input | Path of the input file. | 
| output | String containing the path to the output file. | 
| FUN | Function to be applied to each block. It must take at least two arguments,
the first of which is a  | 
| ... | Additional arguments to be passed to FUN. | 
| key.sep | The character that delimits the first field from the rest. | 
| sep | The field delimiter (often equal to  | 
| skip | Number of lines to skip at the beginning of the file | 
| header | Whether the file has a header. | 
| nblocks | The number of blocks to read. | 
| stringsAsFactors | Whether to convert strings into factors. | 
| colClasses | Vector or list specifying the class of each field. | 
| select | The columns (names or numbers) to be read. | 
| drop | The columns (names or numbers) not to be read. | 
| col.names | Names of the columns. | 
| parallel | Number of cores to use. | 
Value
Returns NULL invisibly. As a side effect,
writes the processed data.table to the output file.
Slogan
ffply: from file to file
Examples
f1 <- system.file("extdata", "dt_iris.csv", package = "fplyr")
f2 <- tempfile()
# Copy the first two blocks from f1 into f2 to obtain a shorter but
# consistent version of the original input file.
ffply(f1, f2, function(d, by) {return(d)}, nblocks = 2)
# Compute the mean of the columns for each species
ffply(f1, f2, function(d, by) d[, lapply(.SD, mean)])
# Reshape the file, block by block
ffply(f1, f2, function(d, by) {
    val <- do.call(c, d)
    var <- rep(names(d), each = nrow(d))
    data.table(Var = var, Val = val)
})