date_format_guess {supportR}R Documentation

Identify Probable Format for Ambiguous Date Formats

Description

In a column containing multiple date formats (e.g., MM/DD/YYYY, "YYYY/MM/DD, etc.) identifies probable format of each date. Provision of a grouping column improves inference. Any formats that cannot be determined are flagged as "FORMAT UNCERTAIN" for human double-checking. This is useful for quickly sorting the bulk of ambiguous dates into clear categories for later conditional wrangling.

Usage

date_format_guess(
  data = NULL,
  date_col = NULL,
  groups = TRUE,
  group_col = NULL,
  return = "dataframe",
  quiet = FALSE
)

Arguments

data

(dataframe) object containing at least one column of ambiguous dates

date_col

(character) name of column containing ambiguous dates

groups

(logical) whether groups exist in the dataframe / should be used (defaults to TRUE)

group_col

(character) name of column containing grouping variable

return

(character) either "dataframe" or "vector" depending on whether the user wants the date format "guesses" returned as a new column on the dataframe or a vector

quiet

(logical) whether certain optional messages should be displayed (defaults to FALSE)

Value

(dataframe or character) object containing date format guesses

Examples

# Create dataframe of example ambiguous dates & grouping variable
my_df <- data.frame('data_enterer' = c('person A', 'person B',
                                       'person B', 'person B',
                                       'person C', 'person D',
                                       'person E', 'person F',
                                       'person G'),
                    'bad_dates' = c('2022.13.08', '2021/2/02',
                                    '2021/2/03', '2021/2/04',
                                    '1899/1/15', '10-31-1901',
                                    '26/11/1901', '08.11.2004',
                                    '6/10/02'))

# Now we can invoke the function!
date_format_guess(data = my_df, date_col = "bad_dates",
group_col = "data_enterer", return = "dataframe")

# If preferred, do it without groups and return a vector
date_format_guess(data = my_df, date_col = "bad_dates",
groups = FALSE, return = "vector")


[Package supportR version 1.3.0 Index]