fast_filter_variables {dataPreparation}R Documentation

Filtering useless variables

Description

Delete columns that are constant or in double in your data_set set.

Usage

fast_filter_variables(
  data_set,
  level = 3,
  keep_cols = NULL,
  verbose = TRUE,
  ...
)

Arguments

data_set

Matrix, data.frame or data.table

level

which columns do you want to filter (1 = constant, 2 = constant and doubles, 3 = constant doubles and bijections, 4 = constant doubles bijections and included)(numeric, default to 3)

keep_cols

List of columns not to drop (list of character, default to NULL)

verbose

Should the algorithm talk (logical or 1 or 2, default to TRUE)

...

optional parameters to be passed to the function when called from another function

Details

verbose can be set to 2 have full details from which functions, otherwise they don't log. (verbose = 1 is equivalent to verbose = TRUE).

Value

The same data_set but with fewer columns. Columns that are constant, in double, or bijection of another have been deleted.

Examples

# First let's build a data.frame with 3 columns: a constant column, and a column in double
df <- data.frame(col1 = 1, col2 = rnorm(1e6), col3 = sample(c(1, 2), 1e6, replace = TRUE))
df$col4 <- df$col2
df$col5[df$col3 == 1] = "a"
df$col5[df$col3 == 2] = "b" # Same info than in col1 but with a for 1 and b for 2
head(df)

# Let's filter columns:
df <- fast_filter_variables(df)
head(df)

[Package dataPreparation version 1.0.4 Index]