strsplit.data.frame {udpipe} | R Documentation |
Obtain a tokenised data frame by splitting text alongside a regular expression
Description
Obtain a tokenised data frame by splitting text alongside a regular expression.
This is the inverse operation of paste.data.frame
.
Usage
strsplit.data.frame(
data,
term,
group,
split = "[[:space:][:punct:][:digit:]]+",
...
)
Arguments
data |
a data.frame or data.table |
term |
a character with a column name from |
group |
a string with a column name or a character vector of column names from |
split |
a regular expression indicating how to split the |
... |
further arguments passed on to |
Value
A tokenised data frame containing one row per token.
This data.frame has the columns from group
and term
where the text in column term
will be split by the provided regular expression into tokens.
See Also
Examples
data(brussels_reviews, package = "udpipe")
x <- strsplit.data.frame(brussels_reviews, term = "feedback", group = "id")
head(x)
x <- strsplit.data.frame(brussels_reviews,
term = c("feedback"),
group = c("listing_id", "language"))
head(x)
x <- strsplit.data.frame(brussels_reviews, term = "feedback", group = "id",
split = " ", fixed = TRUE)
head(x)