string_split2df {stringmagic} | R Documentation |
Splits a character vector into a data frame
Description
Splits a character vector and formats the resulting substrings into a data.frame
Usage
string_split2df(
x,
data = NULL,
split = NULL,
id = NULL,
add.pos = FALSE,
id_unik = TRUE,
fixed = FALSE,
ignore.case = FALSE,
word = FALSE,
envir = parent.frame(),
dt = FALSE,
...
)
string_split2dt(
x,
data = NULL,
split = NULL,
id = NULL,
add.pos = FALSE,
id_unik = TRUE,
fixed = FALSE
)
Arguments
x |
A character vector or a two-sided formula. If a two-sided formula, then the
argument |
data |
Optional, only used if the argument |
split |
A character scalar. Used to split the character vectors. By default
this is a regular expression. You can use flags in the pattern in the form |
id |
Optional. A character vector or a list of vectors. If provided, the
values of |
add.pos |
Logical, default is |
id_unik |
Logical, default is |
fixed |
Logical, default is |
ignore.case |
Logical scalar, default is |
word |
Logical scalar, default is |
envir |
Environment in which to evaluate the interpolations if the flag |
dt |
Logical, default is |
... |
Not currently used. |
Value
It returns a data.frame
or a data.table
which will contain: i) obs
: the observation index,
ii) pos
: the position of the text element in the initial string (optional, via add.pos),
iii) the text element, iv) the identifier(s) (optional, only if id
was provided).
Functions
-
string_split2dt()
: Splits a string vector and returns adata.table
See Also
String operations: string_is()
, string_get()
, string_clean()
, string_split2df()
.
Chain basic operations with string_ops()
. Clean character vectors efficiently
with string_clean()
.
Use string_vec()
to create simple string vectors.
String interpolation combined with operation chaining: string_magic()
. You can change string_magic
default values with string_magic_alias()
and add custom operations with string_magic_register_fun()
.
Display messages while benefiting from string_magic
interpolation with cat_magic()
and message_magic()
.
Other tools with aliases:
cat_magic_alias()
,
string_magic()
,
string_magic_alias()
,
string_ops_alias()
,
string_vec_alias()
Examples
x = c("Nor rain, wind, thunder, fire are my daughters.",
"When my information changes, I alter my conclusions.")
id = c("ws", "jmk")
# we split at each word
string_split2df(x, "[[:punct:] ]+")
# we add the 'id'
string_split2df(x, "[[:punct:] ]+", id = id)
# TO NOTE:
# - the second argument is `data`
# - when it is missing, the argument `split` becomes implicitly the second
# - ex: above we did not use `split = "[[:punct:] ]+"`
#
# using the formula
base = data.frame(text = x, my_id = id)
string_split2df(text ~ my_id, base, "[[:punct:] ]+")
#
# with 2+ identifiers
base = within(mtcars, carname <- rownames(mtcars))
# we have a message because the identifiers are not unique
string_split2df(carname ~ am + gear + carb, base, " +")
# adding the position of the words & removing the message
string_split2df(carname ~ am + gear + carb, base, " +", id_unik = FALSE, add.pos = TRUE)