R: Separate a character column into multiple columns with a...

separate.dtplyr_step {dtplyr}

R Documentation

Separate a character column into multiple columns with a regular expression or numeric locations

Description

This is a method for the tidyr::separate() generic. It is translated to data.table::tstrsplit() in the j argument of ⁠[.data.table⁠.

Usage

## S3 method for class 'dtplyr_step'
separate(
  data,
  col,
  into,
  sep = "[^[:alnum:]]+",
  remove = TRUE,
  convert = FALSE,
  ...
)

Arguments

`data`	A `lazy_dt()`.
`col`	Column name or position. This argument is passed by expression and supports quasiquotation (you can unquote column names or column positions).
`into`	Names of new variables to create as character vector. Use `NA` to omit the variable in the output.
`sep`	Separator between columns. The default value is a regular expression that matches any sequence of non-alphanumeric values.
`remove`	If TRUE, remove the input column from the output data frame.
`convert`	If TRUE, will run type.convert() with as.is = TRUE on new columns. This is useful if the component columns are integer, numeric or logical. NB: this will cause string "NA"s to be converted to NAs.
`...`	Arguments passed on to methods

Examples

library(tidyr)
# If you want to split by any non-alphanumeric value (the default):
df <- lazy_dt(data.frame(x = c(NA, "x.y", "x.z", "y.z")), "DT")
df %>% separate(x, c("A", "B"))

# If you just want the second variable:
df %>% separate(x, c(NA, "B"))

# Use regular expressions to separate on multiple characters:
df <- lazy_dt(data.frame(x = c(NA, "x?y", "x.z", "y:z")), "DT")
df %>% separate(x, c("A","B"), sep = "([.?:])")

# convert = TRUE detects column classes:
df <- lazy_dt(data.frame(x = c("x:1", "x:2", "y:4", "z", NA)), "DT")
df %>% separate(x, c("key","value"), ":") %>% str
df %>% separate(x, c("key","value"), ":", convert = TRUE) %>% str

[Package dtplyr version 1.3.1 Index]