filter_column {matrixset}R Documentation

Subset columns using annotation values

Description

The filter_column() function subsets the columns of all matrices of a matrixset, retaining all columns that satisfy given condition(s). The function filter_column works like dplyr's dplyr::filter().

Usage

filter_column(.ms, ..., .preserve = FALSE)

Arguments

.ms

matrixset object to subset based on the filtering conditions

...

Condition, or expression, that returns a logical value, used to determine if columns are kept or discarded. The expression may refer to column annotations - columns of the column_info component of .ms More than one condition can be supplied and if multiple expressions are included, they are combined with the & operator. Only columns for which all conditions evaluate to TRUE are kept.

.preserve

logical, relevant only if .ms is column grouped. When .preserve is FALSE (the default), the column grouping is updated based on the new matrixset resulting from the filtering. Otherwise, the column grouping is kept as is.

Details

The conditions are given as expressions in ..., which are applied to columns of the annotation data frame (column_info) to determine which columns should be retained.

It can be applied to both grouped and ungrouped matrixset (see column_group_by()), and section ‘Grouped matrixsets’.

Value

A matrixset, with possibly a subset of the columns of the original object. Groups will be updated if .preserve is TRUE.

Grouped matrixsets

Row grouping (row_group_by()) has no impact on column filtering.

The impact of column grouping (column_group_by()) on column filtering depends on the conditions. Often, column grouping will not have any impact, but as soon as an aggregating, lagging or ranking function is involved, then the results will differ.

For instance, the two following are not equivalent (except by pure coincidence).

student_results %>% filter_column(school_average > mean(school_average))

And it's grouped equivalent: student_results %>% column_group_by(program) %>% filter_column(school_average > mean(school_average))

In the ungrouped version, the mean of school_average is taken globally and filter_column keeps columns with school_average greater than this global average. In the grouped version, the average is calculated within each class and the kept columns are the ones with school_average greater than the within-class average.

Examples

# Filtering using one condition
filter_column(student_results, program == "Applied Science")

# Filetring using multiple conditions. These are equivalent
filter_column(student_results, program == "Applied Science" & school_average > 0.8)
filter_column(student_results, program == "Applied Science", school_average > 0.8)

# The potential difference between grouped and non-grouped.
filter_column(student_results, school_average > mean(school_average))
student_results |>
  column_group_by(program) |>
  filter_column(school_average > mean(school_average))


[Package matrixset version 0.3.0 Index]