sift {sift} | R Documentation |
Augmented data frame filtering.
Description
Imagine dplyr::filter
that includes neighboring observations.
Choose how many observations to include by adjusting inputs sift.col
and scope
.
Usage
sift(.data, sift.col, scope, ...)
Arguments
.data |
A data frame. |
sift.col |
Column name, as symbol, to serve as "sifting/augmenting" dimension. Must be non-missing and coercible to numeric. |
scope |
Specifies augmentation bandwidth relative to "key" observations. Parameter should share the same scale as If length 1, bandwidth used is +/- If length 2, bandwidth used is (- |
... |
Expressions passed to |
Details
sift()
can be understood as a 2-step process:
-
.data
is passed todplyr::filter
, using subsetting expression(s) provided in...
. We'll refer to these intermediate results as "key" observations. For each key observation,
sift
expands the row selection bidirectionally along dimension specified bysift.col
. Any row from the original dataset withinscope
units of a key observation is captured in the final result.
Essentially, this allows us to "peek" at neighboring rows surrounding the key observations.
Value
A sifted data frame, with 2 additional columns:
-
.cluster <int>
: Identifies resulting group formed by each key observation and its neighboring rows. When the key observations are close enough together, the clusters will overlap. -
.key <lgl>
:TRUE
indicates key observation.
Examples
# See current events from same timeframe as 2020 Utah Monolith discovery.
sift(nyt2020, pub_date, scope = 2, grepl("Monolith", headline))
# or Biden's presidential victory.
sift(nyt2020, pub_date, scope = 2, grepl("Biden is elected", headline))
# We can specify lower & upper scope to see what happened AFTER Trump tested positive.
sift(nyt2020, pub_date, scope = c(0, 2), grepl("Trump Tests Positive", headline))
# sift recognizes dplyr group specification.
library(dplyr)
library(mopac)
express %>%
group_by(direction) %>%
sift(time, 30, plate == "EAS-1671") # row augmentation performed within groups.