lay {lay}R Documentation

Apply a function within each row

Description

Create efficiently new column(s) in data frame (including tibble) by applying a function one row at a time.

Usage

lay(.data, .fn, ..., .method = c("apply", "tidy"))

Arguments

.data

A data frame or tibble (or other data frame extensions).

.fn

A function to apply to each row of .data. Possible values are:

  • A function, e.g. mean

  • An anonymous function, .e.g. function(x) mean(x, na.rm = TRUE)

  • An anonymous function with shorthand, .e.g. ⁠\(x) mean(x, na.rm = TRUE)⁠

  • A purrr-style lambda, e.g. ~ mean(.x, na.rm = TRUE)

    (wrap the output in a data frame to apply several functions at once, e.g. ~ tibble(min = min(.x), max = max(.x)))

...

Additional arguments for the function calls in .fn (must be named!).

.method

This is an experimental argument that allows you to control which internal method is used to apply the rowwise job:

  • "apply", the default internally uses the function apply().

  • "tidy", internally uses purrr::pmap() and is stricter with respect to class coercion across columns.

The default has been chosen based on these benchmarks.

Details

lay() create a vector or a data frame (or tibble), by considering in turns each row of a data frame (.data) as the vector input of some function(s) .fn.

This makes the creation of new columns based on a rowwise operation both simple (see Examples; below) and efficient (see the Article benchmarks).

The function should be fully compatible with {dplyr}-based workflows and follows a syntax close to dplyr::across().

Yet, it takes .data instead of .cols as a main argument, which makes it possible to also use lay() outside dplyr verbs (see Examples).

The function lay() should work in a wide range of situations, provided that:

If you use lay() within dplyr::mutate(), make sure that the data used by dplyr::mutate() contain no row-grouping, i.e., what is passed to .data in dplyr::mutate() should not be of class grouped_df or rowwise_df. If it is, lay() will be called multiple times, which will slow down the computation despite not influencing the output.

Value

A vector with one element per row of .data, or a data frame (or tibble) with one row per row of .data. The class of the output is determined by .fn.

Examples


# usage without dplyr -------------------------------------------------------------------------

# lay can return a vector
lay(drugs[1:10, -1], any)

# lay can return a data frame
## using the shorthand function syntax \(x) .fn(x)
lay(drugs[1:10, -1],
   \(x) data.frame(drugs_taken = sum(x), drugs_not_taken = sum(x == 0)))

## using the rlang lambda syntax ~ fn(.x)
lay(drugs[1:10, -1],
   ~ data.frame(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0)))

# lay can be used to augment a data frame
cbind(drugs[1:10, ],
      lay(drugs[1:10, -1],
         ~ data.frame(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0))))


# usage with dplyr ----------------------------------------------------------------------------

if (require("dplyr")) {

  # apply any() to each row
  drugs |>
    mutate(everused = lay(pick(-caseid), any))

  # apply any() to each row using all columns
  drugs |>
    select(-caseid) |>
    mutate(everused = lay(pick(everything()), any))

  # a workaround would be to use `rowSums`
  drugs |>
    mutate(everused = rowSums(pick(-caseid)) > 0)

  # but we can lay any function taking a vector as input, e.g. median
  drugs |>
    mutate(used_median = lay(pick(-caseid), median))

  # you can pass arguments to the function
  drugs_with_NA <- drugs
  drugs_with_NA[1, 2] <- NA

  drugs_with_NA |>
    mutate(everused = lay(pick(-caseid), any))
  drugs_with_NA |>
    mutate(everused = lay(pick(-caseid), any, na.rm = TRUE))

  # you can lay the output into a 1-row tibble (or data.frame)
  # if you want to apply multiple functions
  drugs |>
    mutate(lay(pick(-caseid),
             ~ tibble(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0))))

  # note that naming the output prevent the automatic splicing and you obtain a df-column
  drugs |>
    mutate(usage = lay(pick(-caseid),
              ~ tibble(drugs_taken = sum(.x), drugs_not_taken = sum(.x == 0))))

  # if your function returns a vector longer than a scalar, you should turn the output
  # into a tibble, which is the job of as_tibble_row()
  drugs |>
    mutate(lay(pick(-caseid), ~ as_tibble_row(quantile(.x))))

  # note that you could also wrap the output in a list and name it to obtain a list-column
  drugs |>
    mutate(usage_quantiles = lay(pick(-caseid), ~ list(quantile(.x))))
}


[Package lay version 0.1.3 Index]