ftply {fplyr} | R Documentation |
Read, process each block and return a data.table
Description
ftply
takes as input the path to a file and a function, and
returns a data.table
. It is a faster equivalent to using
l <- flply(...)
followed by do.call(rbind, l)
.
Usage
ftply(
input,
FUN = function(d, by) d,
...,
key.sep = "\t",
sep = "\t",
skip = 0,
header = TRUE,
nblocks = Inf,
stringsAsFactors = FALSE,
colClasses = NULL,
select = NULL,
drop = NULL,
col.names = NULL,
parallel = 1
)
Arguments
input |
Path of the input file. |
FUN |
Function to be applied to each block. It must take at least two arguments,
the first of which is a |
... |
Additional arguments to be passed to FUN. |
key.sep |
The character that delimits the first field from the rest. |
sep |
The field delimiter (often equal to |
skip |
Number of lines to skip at the beginning of the file |
header |
Whether the file has a header. |
nblocks |
The number of blocks to read. |
stringsAsFactors |
Whether to convert strings into factors. |
colClasses |
Vector or list specifying the class of each field. |
select |
The columns (names or numbers) to be read. |
drop |
The columns (names or numbers) not to be read. |
col.names |
Names of the columns. |
parallel |
Number of cores to use. |
Details
ftply
is similar to ffply
, but while the latter writes
to disk the result of the processing after each block, the former
keeps the result in memory until all the file has been processed, and
then returns the complete data.table
.
Value
Returns a data.table
with the results of the
processing.
Slogan
ftply: from file to data.table
Examples
f1 <- system.file("extdata", "dt_iris.csv", package = "fplyr")
# Compute the mean of the columns for each species
ftply(f1, function(d, by) d[, lapply(.SD, mean)])
# Read only the first two blocks
ftply(f1, nblocks = 2)