sum_by {gustave} | R Documentation |
Efficient by-group (weighted) summation
Description
sum_by
performs an efficient and optionally weighted
by-group summation by using linear algebra and the Matrix package
capabilities. The by-group summation is performed through matrix cross-product
of the y
parameter (coerced to a matrix if needed) with a (very) sparse
matrix built up using the by
and the (optional) w
parameters.
Compared to base R, dplyr or data.table alternatives, this implementation aims at being easier to use in a matrix-oriented context and can yield efficiency gains when the number of columns becomes high.
Usage
sum_by(y, by, w = NULL, na_rm = TRUE, keep_sparse = FALSE)
Arguments
y |
A (sparse) vector, a (sparse) matrix or a data.frame. The object to perform by-group summation on. |
by |
The factor variable defining the by-groups. Character variables are coerced to factors. |
w |
The optional row weights to be used in the summation. |
na_rm |
Should |
keep_sparse |
When |
Value
A vector, a matrix or a data.frame depending on the type of y
. If y
is
sparse and keep_sparse = TRUE
, then the result is also sparse (without names
when it is a sparse vector, see keep_sparse argument for details).
Author(s)
Martin Chevalier
Examples
# Data generation
set.seed(1)
n <- 100
p <- 10
H <- 3
y <- matrix(rnorm(n*p), ncol = p, dimnames = list(NULL, paste0("var", 1:10)))
y[1, 1] <- NA
by <- letters[sample.int(H, n, replace = TRUE)]
w <- rep(1, n)
w[by == "a"] <- 2
# Standard use
sum_by(y, by)
# Keeping the NAs
sum_by(y, by, na_rm = FALSE)
# With a weight
sum_by(y, by, w = w)