which_are_bijection {dataPreparation} | R Documentation |
Identify bijections
Description
Find all the columns that are bijections of another column.
Usage
which_are_bijection(data_set, keep_cols = NULL, verbose = TRUE)
Arguments
data_set |
Matrix, data.frame or data.table |
keep_cols |
List of columns not to drop (list of character, default to NULL) |
verbose |
Should the algorithm talk (logical, default to TRUE) |
Details
Bijection, meaning that there is another column containing the exact same information (but maybe
coded differently) for example col1: Men/Women, col2 M/W.
This function is performing search by looking to every couple of columns.
It computes numbers of unique elements in each column, and number of unique tuples of values.
Computation is made by exponential search, so that the function is faster.
If verbose
is TRUE, the column logged will be the one returned.
Ex: if column i and column j (with j > i) are bijections it will return j, expect if j is a
character then it return i.
Value
A list of index of columns that have an exact bijection in the data_set set.
Examples
# First let's get a data set
data("adult")
# Now let's check which columns are equals
which_are_in_double(adult)
# It doesn't give any result.
# Let's look of bijections
which_are_bijection(adult)
# Return education_num index because education_num and education which
# contain the same info