| across {collapse} | R Documentation |
Apply Functions Across Multiple Columns
Description
across() can be used inside fmutate and fsummarise to apply one or more functions to a selection of columns. It is overall very similar to dplyr::across, but does not support some rlang features, has some additional features (arguments), and is optimized to work with collapse's, .FAST_FUN, yielding much faster computations.
Usage
across(.cols = NULL, .fns, ..., .names = NULL,
.apply = "auto", .transpose = "auto")
# acr(...) can be used to abbreviate across(...)
Arguments
.cols |
select columns using column names and expressions (e.g. |
.fns |
A function, character vector of functions or list of functions. Vectors / lists can be named to yield alternative names in the result (see |
... |
further arguments to |
.names |
controls the naming of computed columns. |
.apply |
controls whether functions are applied column-by-column ( |
.transpose |
with multiple |
Note
across does not support purr-style lambdas, and does not support dplyr-style predicate functions e.g. across(where(is.numeric), sum), simply use across(is.numeric, sum). In contrast to dplyr, you can also compute on grouping columns.
See Also
fsummarise, fmutate, Fast Data Manipulation, Collapse Overview
Examples
# Basic (Weighted) Summaries
fsummarise(wlddev, across(PCGDP:GINI, fmean, w = POP))
wlddev |> fgroup_by(region, income) |>
fsummarise(across(PCGDP:GINI, fmean, w = POP))
# Note that for these we don't actually need across...
fselect(wlddev, PCGDP:GINI) |> fmean(w = wlddev$POP, drop = FALSE)
wlddev |> fgroup_by(region, income) |>
fselect(PCGDP:GINI, POP) |> fmean(POP, keep.w = FALSE)
collap(wlddev, PCGDP + LIFEEX + GINI ~ region + income, w = ~ POP, keep.w = FALSE)
# But if we want to use some base R function that reguires argument splitting...
wlddev |> na_omit(cols = "POP") |> fgroup_by(region, income) |>
fsummarise(across(PCGDP:GINI, weighted.mean, w = POP, na.rm = TRUE))
# Or if we want to apply different functions...
wlddev |> fgroup_by(region, income) |>
fsummarise(across(PCGDP:GINI, list(mu = fmean, sd = fsd), w = POP),
POP_sum = fsum(POP), OECD = fmean(OECD))
# Note that the above still detects fmean as a fast function, the names of the list
# are irrelevant, but the function name must be typed or passed as a character vector,
# Otherwise functions will be executed by groups e.g. function(x) fmean(x) won't vectorize
# Same, naming in a different way
wlddev |> fgroup_by(region, income) |>
fsummarise(across(PCGDP:GINI, list(mu = fmean, sd = fsd), w = POP, .names = "flip"),
sum_POP = fsum(POP), OECD = fmean(OECD))
# Or we want to do more advanced things..
# Such as nesting data frames..
qTBL(wlddev) |> fgroup_by(region, income) |>
fsummarise(across(c(PCGDP, LIFEEX, ODA),
function(x) list(Nest = list(x)),
.apply = FALSE))
# Or linear models..
qTBL(wlddev) |> fgroup_by(region, income) |>
fsummarise(across(c(PCGDP, LIFEEX, ODA),
function(x) list(Mods = list(lm(PCGDP ~., x))),
.apply = FALSE))
# Or cumputing grouped correlation matrices
qTBL(wlddev) |> fgroup_by(region, income) |>
fsummarise(across(c(PCGDP, LIFEEX, ODA),
function(x) qDF(pwcor(x), "Variable"), .apply = FALSE))
# Here calculating 1- and 10-year lags and growth rates of these variables
qTBL(wlddev) |> fgroup_by(country) |>
fmutate(across(c(PCGDP, LIFEEX, ODA), list(L, G),
n = c(1, 10), t = year, .names = FALSE))
# Same but variables in different order
qTBL(wlddev) |> fgroup_by(country) |>
fmutate(across(c(PCGDP, LIFEEX, ODA), list(L, G), n = c(1, 10),
t = year, .names = FALSE, .transpose = FALSE))