filter_scoped_df {datacleanr}R Documentation

Filter / Subset data dplyr-groupwise

Description

filter_scoped_df subsets rows of a data frame based on grouping structure (see group_by). Filtering statements are provided in a separate tibble where each row represents a combination of a logical expression and a list of groups to which the expression should be applied to corresponding to see indices from cur_group_id).

Usage

filter_scoped_df(dframe, condition_df)

Arguments

dframe

A grouped or ungrouped tibble or data.frame

condition_df

A tibble with two columns; condition_df[ ,1] with character strings which evaluate to valid logical expressions applicable in subset or filter, and condition_df[ ,2], a list-column with group scoping levels (numeric) or NULL for unscoped filtering. If all groups are given for a statement, the operation is the same as for a grouped data.frame in filter.

Details

This function is applied in the "Filtering" tab of the datacleanr app, and applied in the reproducible code recipe in the "Extract" tab. Note, that multiple checks for valid statements are performed in the app (and only valid operations printed in the "Extract" tab). It is therefore not advisable to manually alter this code or use this function interactively.

Value

An object of the same type as dframe. The output is a subset of the input, with groups and rows appearing in the same order, and an additional column .dcrindex representing the group indices. The output may have less groups as the input, depending on subsetting.

Examples

# set-up condition_df
cdf <- dplyr::tibble(
  statement = c(
    "Sepal.Width > quantile(Sepal.Width, 0.1)",
    "Petal.Width > quantile(Petal.Width, 0.1)",
    "Petal.Length > quantile(Petal.Length, 0.8)"
  ),
  scope_at = list(NULL, NULL, c(1, 2))
)


fdf <- filter_scoped_df(
  dplyr::group_by(
    iris,
    Species
  ),
  condition_df = cdf
)

# Example of invalid expression:
# column 'Spec' does not exist in iris
# "Spec == 'setosa'"

[Package datacleanr version 1.0.3 Index]