separate.dtplyr_step {dtplyr}R Documentation

Separate a character column into multiple columns with a regular expression or numeric locations

Description

This is a method for the tidyr::separate() generic. It is translated to data.table::tstrsplit() in the j argument of ⁠[.data.table⁠.

Usage

## S3 method for class 'dtplyr_step'
separate(
  data,
  col,
  into,
  sep = "[^[:alnum:]]+",
  remove = TRUE,
  convert = FALSE,
  ...
)

Arguments

data

A lazy_dt().

col

Column name or position.

This argument is passed by expression and supports quasiquotation (you can unquote column names or column positions).

into

Names of new variables to create as character vector. Use NA to omit the variable in the output.

sep

Separator between columns. The default value is a regular expression that matches any sequence of non-alphanumeric values.

remove

If TRUE, remove the input column from the output data frame.

convert

If TRUE, will run type.convert() with as.is = TRUE on new columns. This is useful if the component columns are integer, numeric or logical.

NB: this will cause string "NA"s to be converted to NAs.

...

Arguments passed on to methods

Examples

library(tidyr)
# If you want to split by any non-alphanumeric value (the default):
df <- lazy_dt(data.frame(x = c(NA, "x.y", "x.z", "y.z")), "DT")
df %>% separate(x, c("A", "B"))

# If you just want the second variable:
df %>% separate(x, c(NA, "B"))

# Use regular expressions to separate on multiple characters:
df <- lazy_dt(data.frame(x = c(NA, "x?y", "x.z", "y:z")), "DT")
df %>% separate(x, c("A","B"), sep = "([.?:])")

# convert = TRUE detects column classes:
df <- lazy_dt(data.frame(x = c("x:1", "x:2", "y:4", "z", NA)), "DT")
df %>% separate(x, c("key","value"), ":") %>% str
df %>% separate(x, c("key","value"), ":", convert = TRUE) %>% str

[Package dtplyr version 1.3.1 Index]