ffply {fplyr} | R Documentation |
Read, process each block and write the result
Description
Suppose you want to process each block of a file and the result is again
a data.table
that you want to print to some output file. One possible
approach is to use l <- flply(...)
followed by do.call(rbind, l)
and fwrite
, but this would be slow. ffply
offers a faster
solution to this problem.
Usage
ffply(
input,
output = "",
FUN,
...,
key.sep = "\t",
sep = "\t",
skip = 0,
header = TRUE,
nblocks = Inf,
stringsAsFactors = FALSE,
colClasses = NULL,
select = NULL,
drop = NULL,
col.names = NULL,
parallel = 1
)
Arguments
input |
Path of the input file. |
output |
String containing the path to the output file. |
FUN |
Function to be applied to each block. It must take at least two arguments,
the first of which is a |
... |
Additional arguments to be passed to FUN. |
key.sep |
The character that delimits the first field from the rest. |
sep |
The field delimiter (often equal to |
skip |
Number of lines to skip at the beginning of the file |
header |
Whether the file has a header. |
nblocks |
The number of blocks to read. |
stringsAsFactors |
Whether to convert strings into factors. |
colClasses |
Vector or list specifying the class of each field. |
select |
The columns (names or numbers) to be read. |
drop |
The columns (names or numbers) not to be read. |
col.names |
Names of the columns. |
parallel |
Number of cores to use. |
Value
Returns NULL invisibly. As a side effect,
writes the processed data.table
to the output file.
Slogan
ffply: from file to file
Examples
f1 <- system.file("extdata", "dt_iris.csv", package = "fplyr")
f2 <- tempfile()
# Copy the first two blocks from f1 into f2 to obtain a shorter but
# consistent version of the original input file.
ffply(f1, f2, function(d, by) {return(d)}, nblocks = 2)
# Compute the mean of the columns for each species
ffply(f1, f2, function(d, by) d[, lapply(.SD, mean)])
# Reshape the file, block by block
ffply(f1, f2, function(d, by) {
val <- do.call(c, d)
var <- rep(names(d), each = nrow(d))
data.table(Var = var, Val = val)
})