fix_date_df {datefixR} | R Documentation |
Clean up messy date columns
Description
Tidies a dataframe
object which has date columns
entered via a free-text box (possibly by different users) and are therefore
in a non-standardized format. Supports numerous separators including /,-, or
space. Supports all-numeric, abbreviation, or long-hand month notation. Where
day of the month has not been supplied, the first day of the month is
imputed. Either DMY or YMD is assumed by default. However, the US system of
MDY is supported via the format
argument.
Usage
fix_date_df(
df,
col.names,
day.impute = 1,
month.impute = 7,
id = NULL,
format = "dmy",
excel = FALSE,
roman.numeral = FALSE
)
Arguments
df |
A dataframe or tibble object with messy date
column(s)
|
col.names |
Character vector of names of columns of messy date data
|
day.impute |
Integer. Day of the month to be imputed if not available.
defaults to 1. Maximum value of 31. If day.impute is greater than the
number of days for a given month, then the last day of that month will be
imputed. If day.impute = NA , then NA will be imputed for
the date instead and a warning will be raised. If day.impute = NULL
then instead of imputing the day of the month, the function will fail.
|
month.impute |
Integer. Month to be be imputed if not available.
Defaults to 7 (July). If month.impute = NA then NA will be
imputed for the date instead and a warning will be raised. If
month.impute = NULL then instead of imputing the month, the
function will fail.
|
id |
Name of column containing row IDs. By default, the first column is
assumed.
|
format |
Character. The format which a date is mostly likely to be given
in. Either "dmy" (default) or "mdy" . If year appears to have
been given first, then YMD is assumed for the subject (format argument is
not used for these observations)
|
excel |
Logical. If a date is given as only numbers (no separators), and
is more than four digits, should the date be assumed to be from Excel
which counts the number of days from 1900-01-01? In most programming
languages (including R), days are instead calculated from 1970-01-01
and this is the default for this function (excel = FALSE )
|
roman.numeral |
Logical. If TRUE,
months detected to have been given as Roman numerals will be converted.
Months are given in Roman numerals in some database systems and biological
records. Defaults to FALSE as this may occasionally interfere with months
in other formats.
|
Value
A dataframe
or tibble
object. Dependent on the type of
df
. Selected columns are of type Date
with the following
format yyyy-mm-dd
See Also
fix_date_char
which is similar to fix_date_df()
except can only be applied to character vectors.
Examples
data(exampledates)
fixed.df <- fix_date_df(exampledates, c("some.dates", "some.more.dates"))
fixed.df
[Package
datefixR version 1.6.1
Index]