R: Fast (Recursive) Splitting

rsplit {collapse}

R Documentation

Fast (Recursive) Splitting

Description

rsplit (recursively) splits a vector, matrix or data frame into subsets according to combinations of (multiple) vectors / factors and returns a (nested) list. If flatten = TRUE, the list is flattened yielding the same result as split. rsplit is implemented as a wrapper around gsplit, and significantly faster than split.

Usage

rsplit(x, ...)

## Default S3 method:
rsplit(x, fl, drop = TRUE, flatten = FALSE, use.names = TRUE, ...)

## S3 method for class 'matrix'
rsplit(x, fl, drop = TRUE, flatten = FALSE, use.names = TRUE,
       drop.dim = FALSE, ...)

## S3 method for class 'data.frame'
rsplit(x, by, drop = TRUE, flatten = FALSE, cols = NULL,
       keep.by = FALSE, simplify = TRUE, use.names = TRUE, ...)

Arguments

`x`	a vector, matrix, data.frame or list like object.
`fl`	a `GRP` object, or a (list of) vector(s) / factor(s) (internally converted to a `GRP` object(s)) used to split `x`.
`by`	data.frame method: Same as `fl`, but also allows one- or two-sided formulas i.e. `~ group1` or `var1 + var2 ~ group1 + group2`. See Examples.
`drop`	logical. `TRUE` removes unused levels or combinations of levels from factors before splitting; `FALSE` retains those combinations yielding empty list elements in the output.
`flatten`	logical. If `fl` is a list of vectors / factors, `TRUE` calls `GRP` on the list, creating a single grouping used for splitting; `FALSE` yields recursive splitting.
`use.names`	logical. `TRUE` returns a named list (like `split`); `FALSE` returns a plain list.
`drop.dim`	logical. `TRUE` returns atomic vectors for matrix-splits consisting of one row.
`cols`	data.frame method: Select columns to split using a function, column names, indices or a logical vector. Note: `cols` is ignored if a two-sided formula is passed to `by`.
`keep.by`	logical. If a formula is passed to `by`, then `TRUE` preserves the splitting (right-hand-side) variables in the data frame.
`simplify`	data.frame method: Logical. `TRUE` calls `rsplit.default` if a single column is split e.g. `rsplit(data, col1 ~ group1)` becomes the same as `rsplit(data$col1, data$group1)`.
`...`	further arguments passed to `GRP`. Sensible choices would be `sort = FALSE`, `decreasing = TRUE` or `na.last = FALSE`. Note that these options only apply if `fl` is not already a (list of) factor(s).

Value

a (nested) list containing the subsets of x.

Examples

rsplit(mtcars$mpg, mtcars$cyl)
rsplit(mtcars, mtcars$cyl)

rsplit(mtcars, mtcars[.c(cyl, vs, am)])
rsplit(mtcars, ~ cyl + vs + am, keep.by = TRUE)  # Same thing
rsplit(mtcars, ~ cyl + vs + am)

rsplit(mtcars, ~ cyl + vs + am, flatten = TRUE)

rsplit(mtcars, mpg ~ cyl)
rsplit(mtcars, mpg ~ cyl, simplify = FALSE)
rsplit(mtcars, mpg + hp ~ cyl + vs + am)
rsplit(mtcars, mpg + hp ~ cyl + vs + am, keep.by = TRUE)

# Split this sectoral data, first by Variable (Emloyment and Value Added), then by Country
GGDCspl <- rsplit(GGDC10S, ~ Variable + Country, cols = 6:16)
str(GGDCspl)

# The nested list can be reassembled using unlist2d()
head(unlist2d(GGDCspl, idcols = .c(Variable, Country)))
rm(GGDCspl)

# Another example with mtcars (not as clean because of row.names)
nl <- rsplit(mtcars, mpg + hp ~ cyl + vs + am)
str(nl)
unlist2d(nl, idcols = .c(cyl, vs, am), row.names = "car")
rm(nl)