determine_distinct {validata} | R Documentation |
Automatically determine primary key
Description
Uses confirm_distinct
in an iterative fashion to determine the primary keys.
Usage
determine_distinct(df, ..., listviewer = TRUE)
Arguments
df |
a data frame |
... |
columns or a tidyselect specification. defaults to everything |
listviewer |
logical. defaults to TRUE to view output using the listviewer package |
Details
The goal of this function is to automatically determine which columns uniquely identify the rows of a dataframe. The output is a printed description of the combination of columns that form unique identifiers at each level. At level 1, the function tests if individual columns are primary keys At level 2, the function tests n C 2 combinations of columns to see if they form primary keys. The final level is testing all columns at once.
For completely unique columns, they are recorded in level 1, but then dropped from the data frame to facilitate the determination of multi-column primary keys.
If the dataset contains duplicated rows, they are eliminated before proceeding.
Value
list
Examples
sample_data1 %>%
head
## on level 1, each column is tested as a unique identifier. the VAL columns have no
## duplicates and hence qualify, even though they normally would be considered as IDs
## on level 3, combinations of 3 columns are tested. implying that ID_COL 1,2,3 form a unique key
## level 2 does not appear, implying that combinations of any 2 ID_COLs do not form a unique key
sample_data1 %>%
determine_distinct(listviewer = FALSE)