R: Identify bijections

which_are_bijection {dataPreparation}

R Documentation

Identify bijections

Description

Find all the columns that are bijections of another column.

Usage

which_are_bijection(data_set, keep_cols = NULL, verbose = TRUE)

Arguments

`data_set`	Matrix, data.frame or data.table
`keep_cols`	List of columns not to drop (list of character, default to NULL)
`verbose`	Should the algorithm talk (logical, default to TRUE)

Details

Bijection, meaning that there is another column containing the exact same information (but maybe coded differently) for example col1: Men/Women, col2 M/W.
This function is performing search by looking to every couple of columns. It computes numbers of unique elements in each column, and number of unique tuples of values.
Computation is made by exponential search, so that the function is faster.
If verbose is TRUE, the column logged will be the one returned.
Ex: if column i and column j (with j > i) are bijections it will return j, expect if j is a character then it return i.

Value

A list of index of columns that have an exact bijection in the data_set set.

Examples

# First let's get a data set
data("adult")

# Now let's check which columns are equals
which_are_in_double(adult)
# It doesn't give any result.

# Let's look of bijections
which_are_bijection(adult)
# Return education_num index because education_num and education which
# contain the same info

[Package dataPreparation version 1.1.1 Index]