p_slice_sample {dtrackr}R Documentation

Slice operations

Description

Slice operations behave as in dplyr, except the history graph can be updated with tracked dataframe with the before and after sizes of the dataframe. See dplyr::slice(), dplyr::slice_head(), dplyr::slice_tail(), dplyr::slice_min(), dplyr::slice_max(), dplyr::slice_sample(), for more details on the underlying functions.

Usage

p_slice_sample(
  .data,
  ...,
  .messages = c("{.count.in} before", "{.count.out} after"),
  .headline = "slice data"
)

Arguments

.data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.

...

Arguments passed on to dplyr::slice_sample

n,prop

Provide either n, the number of rows, or prop, the proportion of rows to select. If neither are supplied, n = 1 will be used. If n is greater than the number of rows in the group (or prop > 1), the result will be silently truncated to the group size. prop will be rounded towards zero to generate an integer number of rows.

A negative value of n or prop will be subtracted from the group size. For example, n = -2 with a group of 5 rows will select 5 - 2 = 3 rows; prop = -0.25 with 8 rows will select 8 * (1 - 0.25) = 6 rows.

weight_by

<data-masking> Sampling weights. This must evaluate to a vector of non-negative numbers the same length as the input. Weights are automatically standardised to sum to 1.

replace

Should sampling be performed with (TRUE) or without (FALSE, the default) replacement.

.messages

a set of glue specs. The glue code can use any global variable, {.count.in}, {.count.out} for the input and output dataframes sizes respectively and {.excluded} for the difference

.headline

a glue spec. The glue code can use any global variable, {.count.in}, {.count.out} for the input and output dataframes sizes respectively.

Value

the sliced dataframe with the history graph updated.

See Also

dplyr::slice_sample()

Examples

library(dplyr)
library(dtrackr)

# In this example the iris dataframe is resampled 100 times with replacement
# within each group and the
iris %>%
  track() %>%
  group_by(Species) %>%
  slice_sample(n=100, replace=TRUE,
               .messages="{.count.out} / {.count.in} = {n}",
               .headline="100 {Species}") %>%
  history()

[Package dtrackr version 0.4.4 Index]