split {data.table} | R Documentation |
Split data.table into chunks in a list
Description
Split method for data.table. Faster and more flexible. Be aware that processing list of data.tables will be generally much slower than manipulation in single data.table by group using by
argument, read more on data.table
.
Usage
## S3 method for class 'data.table'
split(x, f, drop = FALSE,
by, sorted = FALSE, keep.by = TRUE, flatten = TRUE,
..., verbose = getOption("datatable.verbose"))
Arguments
x |
data.table |
f |
factor or list of factors. Same as |
drop |
logical. Default |
by |
character vector. Column names on which split should be made. For |
sorted |
When default |
keep.by |
logical default |
flatten |
logical default |
... |
passed to data.frame way of processing when using |
verbose |
logical default |
Details
Argument f
is just for consistency in usage to data.frame method. Recommended is to use by
argument instead, it will be faster, more flexible, and by default will preserve order according to order in data.
Value
List of data.table
s. If using flatten
FALSE and length(by) > 1L
then recursively nested lists having data.table
s as leafs of grouping according to by
argument.
See Also
Examples
set.seed(123)
DT = data.table(x1 = rep(letters[1:2], 6),
x2 = rep(letters[3:5], 4),
x3 = rep(letters[5:8], 3),
y = rnorm(12))
DT = DT[sample(.N)]
DF = as.data.frame(DT)
# split consistency with data.frame: `x, f, drop`
all.equal(
split(DT, list(DT$x1, DT$x2)),
lapply(split(DF, list(DF$x1, DF$x2)), setDT)
)
# nested list using `flatten` arguments
split(DT, by=c("x1", "x2"))
split(DT, by=c("x1", "x2"), flatten=FALSE)
# dealing with factors
fdt = DT[, c(lapply(.SD, as.factor), list(y=y)), .SDcols=x1:x3]
fdf = as.data.frame(fdt)
sdf = split(fdf, list(fdf$x1, fdf$x2))
all.equal(
split(fdt, by=c("x1", "x2"), sorted=TRUE),
lapply(sdf[sort(names(sdf))], setDT)
)
# factors having unused levels, drop FALSE, TRUE
fdt = DT[, .(x1 = as.factor(c(as.character(x1), "c"))[-13L],
x2 = as.factor(c("a", as.character(x2)))[-1L],
x3 = as.factor(c("a", as.character(x3), "z"))[c(-1L,-14L)],
y = y)]
fdf = as.data.frame(fdt)
sdf = split(fdf, list(fdf$x1, fdf$x2))
all.equal(
split(fdt, by=c("x1", "x2"), sorted=TRUE),
lapply(sdf[sort(names(sdf))], setDT)
)
sdf = split(fdf, list(fdf$x1, fdf$x2), drop=TRUE)
all.equal(
split(fdt, by=c("x1", "x2"), sorted=TRUE, drop=TRUE),
lapply(sdf[sort(names(sdf))], setDT)
)