To unfactorize all columns that have more than a given amount of various values. This function will be usefull after using some reading functions that put every string as factor.
un_factor(data_set, cols = "auto", n_unfactor = 53, verbose = TRUE)
data_set |
Matrix, data.frame or data.table |
cols |
List of column(s) name(s) of data_set to look into. To check all all columns, set it to "auto". (characters, default to "auto") |
n_unfactor |
Number of max element in a factor (numeric, default to 53) |
verbose |
Should the algorithm talk? (logical, default to TRUE) |
If a factor has (strictly) more than n_unfactor
values it is unfactored.
It is recommended to use find_and_transform_numerics
and
find_and_transform_dates
after this function.
If n_unfactor
is set to -1, nothing will be performed.
If there are a lot of column that have been transformed, you might want to look at the
documentation of your data reader in order to stop transforming everything into a factor.
Same data_set (as a data.table) with less factor columns.
# Let's build a data_set
data_set <- data.frame(true_factor = factor(rep(c(1,2), 13)),
false_factor = factor(LETTERS))
# Let's un factorize all factor that have more than 5 different values
data_set <- un_factor(data_set, n_unfactor = 5)
sapply(data_set, class)
# Let's un factorize all factor that have more than 5 different values
data_set <- un_factor(data_set, n_unfactor = 0)
sapply(data_set, class)