filter_row {matrixset}R Documentation

Subset rows using annotation values

Description

The filter_row() function subsets the rows of all matrices of a matrixset, retaining all rows that satisfy given condition(s). The function filter_row works like dplyr's dplyr::filter().

Usage

filter_row(.ms, ..., .preserve = FALSE)

Arguments

.ms

matrixset object to subset based on the filtering conditions

...

Condition, or expression, that returns a logical value, used to determine if rows are kept or discarded. The expression may refer to row annotations - columns of the row_info component of .ms More than one condition can be supplied and if multiple expressions are included, they are combined with the & operator. Only rows for which all conditions evaluate to TRUE are kept.

.preserve

logical, relevant only if .ms is row grouped. When .preserve is FALSE (the default), the row grouping is updated based on the new matrixset resulting from the filtering. Otherwise, the row grouping is kept as is.

Details

The conditions are given as expressions in ..., which are applied to columns of the annotation data frame (row_info) to determine which rows should be retained.

It can be applied to both grouped and ungrouped matrixset (see row_group_by()), and section ‘Grouped matrixsets’.

Value

A matrixset, with possibly a subset of the rows of the original object. Groups will be updated if .preserve is TRUE.

Grouped matrixsets

Column grouping (column_group_by()) has no impact on row filtering.

The impact of row grouping (row_group_by()) on row filtering depends on the conditions. Often, row grouping will not have any impact, but as soon as an aggregating, lagging or ranking function is involved, then the results will differ.

For instance, the two following are not equivalent (except by pure coincidence).

student_results %>% filter_row(previous_year_score > mean(previous_year_score))

And it's grouped equivalent: student_results %>% row_group_by(class) %>% filter_row(previous_year_score > mean(previous_year_score))

In the ungrouped version, the mean of previous_year_score is taken globally and filter_row keeps rows with previous_year_score greater than this global average. In the grouped version, the average is calculated within each class and the kept rows are the ones with previous_year_score greater than the within-class average.

Examples

# Filtering using one condition
filter_row(student_results, class == "classA")

# Filetring using multiple conditions. These are equivalent
filter_row(student_results, class == "classA" & previous_year_score > 0.75)
filter_row(student_results, class == "classA", previous_year_score > 0.75)

# The potential difference between grouped and non-grouped.
filter_row(student_results, previous_year_score > mean(previous_year_score))
student_results |>
  row_group_by(teacher) |>
  filter_row(previous_year_score > mean(previous_year_score))


[Package matrixset version 0.3.0 Index]