un_factor {dataPreparation}R Documentation

Unfactor factor with too many values

Description

To un-factorize all columns that have more than a given amount of various values. This function will be usefully after using some reading functions that put every string as factor.

Usage

un_factor(data_set, cols = "auto", n_unfactor = 53, verbose = TRUE)

Arguments

data_set

Matrix, data.frame or data.table

cols

List of column(s) name(s) of data_set to look into. To check all all columns, set it to "auto". (characters, default to "auto")

n_unfactor

Number of max element in a factor (numeric, default to 53)

verbose

Should the algorithm talk? (logical, default to TRUE)

Details

If a factor has (strictly) more than n_unfactor values it is un-factored.
It is recommended to use find_and_transform_numerics and find_and_transform_dates after this function.
If n_unfactor is set to -1, nothing will be performed.
If there are a lot of column that have been transformed, you might want to look at the documentation of your data reader in order to stop transforming everything into a factor.

Value

Same data_set (as a data.table) with less factor columns.

Examples

# Let's build a data_set
data_set <- data.frame(true_factor = factor(rep(c(1,2), 13)),
                      false_factor = factor(LETTERS))

# Let's un factorize all factor that have more than 5 different values
data_set <- un_factor(data_set, n_unfactor = 5)
sapply(data_set, class)
# Let's un factorize all factor that have more than 5 different values
data_set <- un_factor(data_set, n_unfactor = 0)
sapply(data_set, class)


[Package dataPreparation version 1.1.1 Index]