| ohse {lares} | R Documentation |
One Hot Smart Encoding (Dummy Variables)
Description
This function lets the user automatically transform a dataframe with categorical columns into numerical by one hot encoding technic.
Usage
ohse(
df,
redundant = FALSE,
drop = TRUE,
ignore = NULL,
dates = FALSE,
holidays = FALSE,
country = "Venezuela",
currency_pair = NA,
trim = 0,
limit = 10,
variance = 0.9,
other_label = "OTHER",
sep = "_",
quiet = FALSE,
...
)
Arguments
df |
Dataframe |
redundant |
Boolean. Should we keep redundant columns? i.e. If the
column only has two different values, should we keep both new columns?
Is set to |
drop |
Boolean. Drop automatically some useless features? |
ignore |
Vector or character. Which column should be ignored? |
dates |
Boolean. Do you want the function to create more features out of the date/time columns? |
holidays |
Boolean. Include holidays as new columns? |
country |
Character or vector. For which countries should the holidays be included? |
currency_pair |
Character. Which currency exchange do you wish to get the history from? i.e, USD/COP, EUR/USD... |
trim |
Integer. Trim names until the nth character |
limit |
Integer. Limit one hot encoding to the n most frequent
values of each column. Set to |
variance |
Numeric. Drop columns with more than n variance. Range: 0-1. For example: if a variable contains 91 unique different values out of 100 observations, this column will be suppressed if value is set to 0.9 |
other_label |
Character. With which text do you wish to replace the filtered values with? |
sep |
Character. Separator's string |
quiet |
Boolean. Quiet all messages and summaries? |
... |
Additional parameters. |
Value
data.frame on which all features are numerical by nature or transformed with one hot encoding.
See Also
Other Data Wrangling:
balance_data(),
categ_reducer(),
cleanText(),
date_cuts(),
date_feats(),
file_name(),
formatHTML(),
holidays(),
impute(),
left(),
normalize(),
num_abbr(),
ohe_commas(),
quants(),
removenacols(),
replaceall(),
replacefactor(),
textFeats(),
textTokenizer(),
vector2text(),
year_month(),
zerovar()
Other Feature Engineering:
date_feats(),
holidays()
Other One Hot Encoding:
date_feats(),
holidays(),
ohe_commas()
Examples
data(dft)
dft <- dft[, c(2, 3, 5, 9, 11)]
ohse(dft, limit = 3) %>% head(3)
ohse(dft, limit = 3, redundant = NULL) %>% head(3)
# Getting rid of columns with no (or too much) variance
dft$no_variance1 <- 0
dft$no_variance2 <- c("A", rep("B", nrow(dft) - 1))
dft$no_variance3 <- as.character(rnorm(nrow(dft)))
dft$no_variance4 <- c(rep("A", 20), round(rnorm(nrow(dft) - 20), 4))
ohse(dft, limit = 3) %>% head(3)