detect_dupl_cols {dataframeexplorer}R Documentation

Detect if any column of a data.frame is a duplicate of another

Description

It occasionally happens that 2 (or more) columns in dataframe are exactly identical. This could lead to redundant computational cost and unexpected behavior in Machine Learning methods. This function scans though all column combinations of dataframe to examine if any 2 columns are exactly identical.

Usage

detect_dupl_cols(dataset, return_type = "col_names", duplicate_col = "right")

Arguments

dataset

A data.frame

return_type

How to return detected duplicate columns Use "col_names", "col_positions" or "dataset" to return dataset with deleted duplicate columns

duplicate_col

If 2 columns are identical, which of the 2 columns should be treated as duplicate? Use "right" for right column, "left" for left.

Value

A vector of duplicate column names or column positions or dataset with deleted duplicate columns. Use return_type parameter to specify.

Examples

## Not run: 
detect_dupl_cols(dataset = head(mutate(mtcars, mpg_2 =  mpg)), duplicate_col = "right")

## End(Not run)

[Package dataframeexplorer version 1.0.2 Index]