R: NSE subsetting

nse {prt}

R Documentation

NSE subsetting

Description

A cornerstone feature of prt is the ability to load a (small) subset of rows (or columns) from a much larger tabular dataset. In order to specify such a subset, an implementation of the base R S3 generic function subset() is provided, driving the non-standard evaluation (NSE) of an expression within the context of the data (with similar semantics as the base R implementation for data.frames).

Usage

## S3 method for class 'prt'
subset(x, subset, select, part_safe = FALSE, drop = FALSE, ...)

subset_quo(
  x,
  subset = NULL,
  select = NULL,
  part_safe = FALSE,
  env = parent.frame()
)

Arguments

`x`	object to be subsetted.
`subset`	logical expression indicating elements or rows to keep: missing values are taken as false.
`select`	expression, indicating columns to select from a data frame.
`part_safe`	Logical flag indicating whether the `subset` expression can be safely be applied to individual partitions.
`drop`	passed on to `[` indexing operator.
`...`	further arguments to be passed to or from other methods.
`env`	The environment in which `subset` and `select` are evaluated in. This environment is not applicable for quosures because they have their own environments.

Details

The functions powering NSE are rlang::enquo() which quote the subset and select arguments and rlang::eval_tidy() which evaluates the expressions. This allows for some rlang-specific features to be used, such as the .data/.env pronouns, or the double-curly brace forwarding operator. For some example code, please refer to vignette("prt", package = "prt").

While the function subset() quotes the arguments passed as subset and select, the function subset_quo() can be used to operate on already quoted expressions. A final noteworthy departure from the base R interface is the part_safe argument: this logical flag indicates whether it is safe to evaluate the expression on partitions individually or whether dependencies between partitions prevent this from yielding correct results. As it is not straightforward to determine if dependencies might exists from the expression alone, the default is FALSE, which in many cases will result in a less efficient resolution of the row-selection and it is up to the user to enable this optimization.

Examples

dat <- as_prt(mtcars, n_chunks = 2L)

subset(dat, cyl == 6)
subset(dat, cyl == 6 & hp > 110)

colnames(subset(dat, select = mpg:hp))
colnames(subset(dat, select = -c(vs, am)))

sub_6 <- subset(dat, cyl == 6)

thresh <- 6
identical(subset(dat, cyl == thresh), sub_6)
identical(subset(dat, cyl == .env$thresh), sub_6)

cyl <- 6
identical(subset(dat, cyl == cyl), data.table::as.data.table(dat))
identical(subset(dat, cyl == !!cyl), sub_6)
identical(subset(dat, .data$cyl == .env$cyl), sub_6)

expr <- quote(cyl == 6)
# passing a quoted expression to subset() will yield an error
## Not run: 
  subset(dat, expr)

## End(Not run)
identical(subset_quo(dat, expr), sub_6)

identical(
  subset(dat, qsec > mean(qsec), part_safe = TRUE),
  subset(dat, qsec > mean(qsec), part_safe = FALSE)
)

[Package prt version 0.2.0 Index]