sift {sift}R Documentation

Augmented data frame filtering.

Description

Imagine dplyr::filter that includes neighboring observations. Choose how many observations to include by adjusting inputs sift.col and scope.

Usage

sift(.data, sift.col, scope, ...)

Arguments

.data

A data frame.

sift.col

Column name, as symbol, to serve as "sifting/augmenting" dimension. Must be non-missing and coercible to numeric.

scope

Specifies augmentation bandwidth relative to "key" observations. Parameter should share the same scale as sift.col.

If length 1, bandwidth used is +/- scope.

If length 2, bandwidth used is (-scope[1], +scope[2]).

...

Expressions passed to dplyr::filter, of which the results serve as the "key" observations. The same data-masking rules used in dplyr::filter apply here.

Details

sift() can be understood as a 2-step process:

  1. .data is passed to dplyr::filter, using subsetting expression(s) provided in .... We'll refer to these intermediate results as "key" observations.

  2. For each key observation, sift expands the row selection bidirectionally along dimension specified by sift.col. Any row from the original dataset within scope units of a key observation is captured in the final result.

Essentially, this allows us to "peek" at neighboring rows surrounding the key observations.

Value

A sifted data frame, with 2 additional columns:

Examples

# See current events from same timeframe as 2020 Utah Monolith discovery.
sift(nyt2020, pub_date, scope = 2, grepl("Monolith", headline))

# or Biden's presidential victory.
sift(nyt2020, pub_date, scope = 2, grepl("Biden is elected", headline))

# We can specify lower & upper scope to see what happened AFTER Trump tested positive.
sift(nyt2020, pub_date, scope = c(0, 2), grepl("Trump Tests Positive", headline))

# sift recognizes dplyr group specification.
library(dplyr)
library(mopac)
express %>%
 group_by(direction) %>%
 sift(time, 30, plate == "EAS-1671") # row augmentation performed within groups.

[Package sift version 0.1.0 Index]