step_collapse_stringdist {embed} | R Documentation |
collapse factor levels using stringdist
Description
step_collapse_stringdist()
creates a specification of a recipe step that
will collapse factor levels that have a low stringdist between them.
Usage
step_collapse_stringdist(
recipe,
...,
role = NA,
trained = FALSE,
distance = NULL,
method = "osa",
options = list(),
results = NULL,
columns = NULL,
skip = FALSE,
id = rand_id("collapse_stringdist")
)
Arguments
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose which variables are
affected by the step. See |
role |
Not used by this step since no new variables are created. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
distance |
Integer, value to determine which strings should be collapsed
with which. The value is being used inclusive, so |
method |
Character, method for distance calculation. The default is
|
options |
List, other arguments passed to
|
results |
A list denoting the way the labels should be collapses is
stored here once this preprocessing step has be trained by |
columns |
A character string of variable names that will be populated
(eventually) by the |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
Value
An updated version of recipe
with the new step added to the
sequence of existing steps (if any). For the tidy
method, a tibble with
columns terms
(the columns that will be affected) and base
.
Tidying
When you tidy()
this step, a tibble is retruned with
columns terms
, from
, to
, and id
:
- terms
character, the selectors or variables selected
- from
character, the old levels
- too
character, the new levels
- id
character, id of this step
Case weights
The underlying operation does not allow for case weights.
Examples
library(recipes)
library(tibble)
data0 <- tibble(
x1 = c("a", "b", "d", "e", "sfgsfgsd", "hjhgfgjgr"),
x2 = c("ak", "b", "djj", "e", "hjhgfgjgr", "hjhgfgjgr")
)
rec <- recipe(~., data = data0) %>%
step_collapse_stringdist(all_predictors(), distance = 1) %>%
prep()
rec %>%
bake(new_data = NULL)
tidy(rec, 1)
rec <- recipe(~., data = data0) %>%
step_collapse_stringdist(all_predictors(), distance = 2) %>%
prep()
rec %>%
bake(new_data = NULL)
tidy(rec, 1)