cleanse.data.frame {alookr} | R Documentation |
Cleansing the dataset for classification modeling
Description
The cleanse() cleanse the dataset for classification modeling
Usage
## S3 method for class 'data.frame'
cleanse(
.data,
uniq = TRUE,
uniq_thres = 0.1,
char = TRUE,
missing = FALSE,
verbose = TRUE,
...
)
cleanse(.data, ...)
Arguments
.data |
a data.frame or a |
uniq |
logical. Set whether to remove the variables whose unique value is one. |
uniq_thres |
numeric. Set a threshold to removing variables when the ratio of unique values(number of unique values / number of observation) is greater than the set value. |
char |
logical. Set the change the character to factor. |
missing |
logical. Set whether to removing variables including missing value |
verbose |
logical. Set whether to echo information to the console at runtime. |
... |
further arguments passed to or from other methods. |
Details
This function is useful when fit the classification model. This function does the following.: Remove the variable with only one value. And remove variables that have a unique number of values relative to the number of observations for a character or categorical variable. In this case, it is a variable that corresponds to an identifier or an identifier. And converts the character to factor.
Value
An object of data.frame or train_df. and return value is an object of the same type as the .data argument.
Examples
# create sample dataset
set.seed(123L)
id <- sapply(1:1000, function(x)
paste(c(sample(letters, 5), x), collapse = ""))
year <- "2018"
set.seed(123L)
count <- sample(1:10, size = 1000, replace = TRUE)
set.seed(123L)
alpha <- sample(letters, size = 1000, replace = TRUE)
set.seed(123L)
flag <- sample(c("Y", "N"), size = 1000, prob = c(0.1, 0.9), replace = TRUE)
dat <- data.frame(id, year, count, alpha, flag, stringsAsFactors = FALSE)
# structure of dataset
str(dat)
# cleansing dataset
newDat <- cleanse(dat)
# structure of cleansing dataset
str(newDat)
# cleansing dataset
newDat <- cleanse(dat, uniq = FALSE)
# structure of cleansing dataset
str(newDat)
# cleansing dataset
newDat <- cleanse(dat, uniq_thres = 0.3)
# structure of cleansing dataset
str(newDat)
# cleansing dataset
newDat <- cleanse(dat, char = FALSE)
# structure of cleansing dataset
str(newDat)