fslice {timeplyr} | R Documentation |
Faster dplyr::slice()
Description
When there are lots of groups, the fslice()
functions are much faster.
Usage
fslice(data, ..., .by = NULL, keep_order = FALSE, sort_groups = TRUE)
fslice_head(
data,
...,
n,
prop,
.by = NULL,
keep_order = FALSE,
sort_groups = TRUE
)
fslice_tail(
data,
...,
n,
prop,
.by = NULL,
keep_order = FALSE,
sort_groups = TRUE
)
fslice_min(
data,
order_by,
...,
n,
prop,
.by = NULL,
with_ties = TRUE,
na_rm = FALSE,
keep_order = FALSE,
sort_groups = TRUE
)
fslice_max(
data,
order_by,
...,
n,
prop,
.by = NULL,
with_ties = TRUE,
na_rm = FALSE,
keep_order = FALSE,
sort_groups = TRUE
)
fslice_sample(
data,
n,
replace = FALSE,
prop,
.by = NULL,
keep_order = FALSE,
sort_groups = TRUE,
weights = NULL,
seed = NULL
)
Arguments
data |
Data frame |
... |
See |
.by |
(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select. |
keep_order |
Should the sliced data frame be returned in its original order?
The default is |
sort_groups |
If |
n |
Number of rows. |
prop |
Proportion of rows. |
order_by |
Variables to order by. |
with_ties |
Should ties be kept together? The default is |
na_rm |
Should missing values in |
replace |
Should |
weights |
Probability weights used in |
seed |
Seed number defining RNG state.
If supplied, this is only applied locally within the function
and the seed state isn't retained after sampling.
To clarify, whatever seed state was in place before the function call,
is restored to ensure seed continuity.
If left |
Details
fslice()
and friends allow for more flexibility in how you order the by-group slicing.
Furthermore, you can control whether the returned data frame is sliced in
the order of the supplied row indices, or whether the
original order is retained (like dplyr::filter()
).
In fslice()
, when length(n) == 1
, an optimised method is implemented
that internally uses list_subset()
, a fast function for extracting
single elements from single-level lists that contain vectors of the same
type, e.g. integer.
fslice_head()
and fslice_tail()
are very fast with large numbers of groups.
fslice_sample()
is arguably more intuitive as it by default
resamples each entire group without replacement, without having to specify a
maximum group size like in dplyr::slice_sample()
.
Value
A data.frame
of specified rows.
Examples
library(timeplyr)
library(dplyr)
library(nycflights13)
flights <- flights %>%
group_by(origin, dest)
# First row repeated for each group
flights %>%
fslice(1, 1)
# First row per group
flights %>%
fslice_head(n = 1)
# Last row per group
flights %>%
fslice_tail(n = 1)
# Earliest flight per group
flights %>%
fslice_min(time_hour, with_ties = FALSE)
# Last flight per group
flights %>%
fslice_max(time_hour, with_ties = FALSE)
# Random sample without replacement by group
# (or stratified random sampling)
flights %>%
fslice_sample()