fmply {fplyr} | R Documentation |
Read, process and write to multiple output files
Description
Sometimes a file should be processed in many different ways. fmply()
applies a function to each block of the file; the function should return a
list of m data.table
s, each of which is written to a different
output file. Optionally, the function can return a list of m + 1,
where the first m elements are data.table
s and are written
to the output files, while the last element is returned as in flply()
.
Usage
fmply(
input,
outputs,
FUN,
...,
key.sep = "\t",
sep = "\t",
skip = 0,
header = TRUE,
nblocks = Inf,
stringsAsFactors = FALSE,
colClasses = NULL,
select = NULL,
drop = NULL,
col.names = NULL,
parallel = 1
)
Arguments
input |
Path of the input file. |
outputs |
Vector of m paths for the output files. |
FUN |
A function to apply to each block. Takes as input a |
... |
Additional arguments to be passed to FUN. |
key.sep |
The character that delimits the first field from the rest. |
sep |
The field delimiter (often equal to |
skip |
Number of lines to skip at the beginning of the file |
header |
Whether the file has a header. |
nblocks |
The number of blocks to read. |
stringsAsFactors |
Whether to convert strings into factors. |
colClasses |
Vector or list specifying the class of each field. |
select |
The columns (names or numbers) to be read. |
drop |
The columns (names or numbers) not to be read. |
col.names |
Names of the columns. |
parallel |
Number of cores to use. |
Value
If FUN
returns m elements, fmply()
returns
NULL invisibly. If FUN
returns m + 1
elements, fmply()
returns the list of all the last elements. As a
side effect, it writes the first m outputs of FUN
to the
outputs
files.
Slogan
fmply: from file to multiple files
Examples
fin <- system.file("extdata", "dt_iris.csv", package = "fplyr")
fout1 <- tempfile()
fout2 <- ""
# Copy the input file to tempfile as it is, and, at the same time, print
# a summary to the console
fmply(fin, c(fout1, fout2), function(d) {
list(d, data.table(unclass(summary(d))))
})
fout3 <- tempfile()
fout4 <- tempfile()
# Use linear and polynomial regression and print the outputs to two files
fmply(fin, c(fout3, fout4), function(d) {
lr.fit <- lm(Sepal.Length ~ ., data = d[, !"Species"])
lr.summ <- data.table(Species = d$Species[1], t(coefficients(lr.fit)))
pr.fit <- lm(Sepal.Length ~ poly(as.matrix(d[, 3:5]), degree = 3),
data = d[, !"Species"])
pr.summ <- data.table(Species = d$Species[1], t(coefficients(pr.fit)))
list(lr.summ, pr.summ)
})