categ_reducer {lares} | R Documentation |
Reduce categorical values
Description
This function lets the user reduce categorical values in a vector. It is tidyverse friendly for use on pipelines
Usage
categ_reducer(
df,
var,
nmin = 0,
pmin = 0,
pcummax = 100,
top = NA,
pvalue_max = 1,
cor_var = "tag",
limit = 20,
other_label = "other",
...
)
Arguments
df |
Categorical Vector |
var |
Variable. Which variable do you wish to reduce? |
nmin |
Integer. Number of minimum times a value is repeated |
pmin |
Numerical. Percentage of minimum times a value is repeated |
pcummax |
Numerical. Top cumulative percentage of most repeated values |
top |
Integer. Keep the n most frequently repeated values |
pvalue_max |
Numeric (0-1]. Max pvalue categories |
cor_var |
Character. If pvalue_max < 1, you must define which column name will be compared with (numerical or binary). |
limit |
Integer. Limit one hot encoding to the n most frequent
values of each column. Set to |
other_label |
Character. With which text do you wish to replace the filtered values with? |
... |
Additional parameters. |
Value
data.frame df
on which var
has been transformed
See Also
Other Data Wrangling:
balance_data()
,
cleanText()
,
date_cuts()
,
date_feats()
,
file_name()
,
formatHTML()
,
holidays()
,
impute()
,
left()
,
normalize()
,
num_abbr()
,
ohe_commas()
,
ohse()
,
quants()
,
removenacols()
,
replaceall()
,
replacefactor()
,
textFeats()
,
textTokenizer()
,
vector2text()
,
year_month()
,
zerovar()
Examples
data(dft) # Titanic dataset
categ_reducer(dft, Embarked, top = 2) %>% freqs(Embarked)
categ_reducer(dft, Ticket, nmin = 7, other_label = "Other Ticket") %>% freqs(Ticket)
categ_reducer(dft, Ticket, pvalue_max = 0.05, cor_var = "Survived") %>% freqs(Ticket)