find_and_transform_dates {dataPreparation} | R Documentation |
Identify date columns
Description
Find and transform dates that are hidden in a character column.
It use a bunch of default formats, and you can also add your own formats.
Usage
find_and_transform_dates(
data_set,
cols = "auto",
formats = NULL,
n_test = 30,
ambiguities = "IGNORE",
verbose = TRUE
)
Arguments
data_set |
Matrix, data.frame or data.table |
cols |
List of column(s) name(s) of data_set to look into. To check all all columns, set it to "auto". (characters, default to "auto") |
formats |
List of additional Date formats to check (see |
n_test |
Number of non-null rows on which to test (numeric, default to 30) |
ambiguities |
How ambiguities should be treated (see details in ambiguities section) (character, default to IGNORE) |
verbose |
Should the algorithm talk? (Logical, default to TRUE) |
Details
This function is using identify_dates
to find formats. Please see it's documentation.
In case identify_dates
doesn't find wanted formats you can either provide format
in param formats
or use set_col_as_date
to force transformation.
Value
data_set set (as a data.table) with identified dates transformed by reference.
Ambiguity
Ambiguities are often present in dates. For example, in date: 2017/01/01, there is no way to know
if format is YYYY/MM/DD or YYYY/DD/MM.
Some times ambiguity can be solved by a human. For example
17/12/31, a human might guess that it is YY/MM/DD, but there is no sure way to know.
To be safe, find_and_transform_dates doesn't try to guess ambiguities.
To answer ambiguities problem, param ambiguities
is now available. It can take one of the following values
-
IGNORE
function will then take the first format which match (fast, but can make some mistakes) -
WARN
function will try all format and tell you - via prints - that there are multiple matches (and won't perform date transformation) -
SOLVE
function will try to solve ambiguity by going through more lines, so will be slower. If it is able to solve it, it will transform the column, if not it will print the various acceptable formats.
If there are some columns that have no chance to be a match think of removing them from cols
to save some computation time.
Examples
# Load exemple set
data(tiny_messy_adult)
head(tiny_messy_adult)
# using the find_and_transform_dates
find_and_transform_dates(tiny_messy_adult, n_test = 5)
head(tiny_messy_adult)
# Example with ambiguities
## Not run:
require(data.table)
data(tiny_messy_adult) # reload data
# Add an ambiguity by sorting date1
tiny_messy_adult$date1 = sort(tiny_messy_adult$date1, na.last = TRUE)
# Try all three methods:
result_1 = find_and_transform_dates(copy(tiny_messy_adult))
result_2 = find_and_transform_dates(copy(tiny_messy_adult), ambiguities = "WARN")
result_3 = find_and_transform_dates(copy(tiny_messy_adult), ambiguities = "SOLVE")
## End(Not run)
# "##NOT RUN:" mean that this example hasn't been run on CRAN since its long. But you can run it!