R: Automatically determine primary key

determine_distinct {validata}

R Documentation

Automatically determine primary key

Description

Uses confirm_distinct in an iterative fashion to determine the primary keys.

Usage

determine_distinct(df, ..., listviewer = TRUE)

Arguments

`df`	a data frame
`...`	columns or a tidyselect specification. defaults to everything
`listviewer`	logical. defaults to TRUE to view output using the listviewer package

Details

The goal of this function is to automatically determine which columns uniquely identify the rows of a dataframe. The output is a printed description of the combination of columns that form unique identifiers at each level. At level 1, the function tests if individual columns are primary keys At level 2, the function tests n C 2 combinations of columns to see if they form primary keys. The final level is testing all columns at once.

For completely unique columns, they are recorded in level 1, but then dropped from the data frame to facilitate the determination of multi-column primary keys.
If the dataset contains duplicated rows, they are eliminated before proceeding.

Value

list

Examples


sample_data1 %>%
head


## on level 1, each column is tested as a unique identifier. the VAL columns have no
## duplicates and hence qualify, even though they normally would be considered as IDs
## on level 3, combinations of 3 columns are tested. implying that ID_COL 1,2,3 form a unique key
## level 2 does not appear, implying that combinations of any 2 ID_COLs do not form a unique key

sample_data1 %>%
determine_distinct(listviewer = FALSE)

[Package validata version 0.1.0 Index]