identify_dates {dataPreparation}R Documentation

Identify date columns

Description

Function to identify dates columns and give there format. It use a bunch of default formats. But you can also add your own formats.

Usage

identify_dates(
  data_set,
  cols = "auto",
  formats = NULL,
  n_test = 30,
  ambiguities = "IGNORE",
  verbose = TRUE
)

Arguments

data_set

Matrix, data.frame or data.table

cols

List of column(s) name(s) of data_set to look into. To check all all columns, set it to "auto". (characters, default to "auto")

formats

List of additional Date formats to check (see strptime)

n_test

Number of non-null rows on which to test (numeric, default to 30)

ambiguities

How ambiguities should be treated (see details in ambiguities section) (character, default to IGNORE)

verbose

Should the algorithm talk? (Logical, default to TRUE)

Details

This function is looking for perfect transformation. If there are some mistakes in data_set, consider setting them to NA before.
In the unlikely case where you have numeric higher than as.numeric(as.POSIXct("1990-01-01")) they will be considered as timestamps and you might have some issues. On the other side, if you have timestamps before 1990-01-01, they won't be found, but you can use set_col_as_date to force transformation.

Value

A named list with names being col names of data_set and values being formats.

Ambiguity

Ambiguities are often present in dates. For example, in date: 2017/01/01, there is no way to know if format is YYYY/MM/DD or YYYY/DD/MM.
Some times ambiguity can be solved by a human. For example 17/12/31, a human might guess that it is YY/MM/DD, but there is no sure way to know.
To be safe, find_and_transform_dates doesn't try to guess ambiguities.
To answer ambiguities problem, param ambiguities is now available. It can take one of the following values

Examples

# Load exemple set
data(tiny_messy_adult)
head(tiny_messy_adult)
# using the find_and_transform_dates
identify_dates(tiny_messy_adult, n_test = 5)

[Package dataPreparation version 1.1.1 Index]