un_factor {dataPreparation} R Documentation

## Unfactor factor with too many values

### Description

To unfactorize all columns that have more than a given amount of various values. This function will be usefull after using some reading functions that put every string as factor.

### Usage

un_factor(data_set, cols = "auto", n_unfactor = 53, verbose = TRUE)


### Arguments

 data_set Matrix, data.frame or data.table cols List of column(s) name(s) of data_set to look into. To check all all columns, set it to "auto". (characters, default to "auto") n_unfactor Number of max element in a factor (numeric, default to 53) verbose Should the algorithm talk? (logical, default to TRUE)

### Details

If a factor has (strictly) more than n_unfactor values it is unfactored.
It is recommended to use find_and_transform_numerics and find_and_transform_dates after this function.
If n_unfactor is set to -1, nothing will be performed.
If there are a lot of column that have been transformed, you might want to look at the documentation of your data reader in order to stop transforming everything into a factor.

### Value

Same data_set (as a data.table) with less factor columns.

### Examples

# Let's build a data_set
data_set <- data.frame(true_factor = factor(rep(c(1,2), 13)),
false_factor = factor(LETTERS))

# Let's un factorize all factor that have more than 5 different values
data_set <- un_factor(data_set, n_unfactor = 5)
sapply(data_set, class)
# Let's un factorize all factor that have more than 5 different values
data_set <- un_factor(data_set, n_unfactor = 0)
sapply(data_set, class)



[Package dataPreparation version 1.0.4 Index]