parse_date_time {lubridate} | R Documentation |
User friendly date-time parsing functions
Description
parse_date_time()
parses an input vector into POSIXct date-time
object. It differs from base::strptime()
in two respects. First,
it allows specification of the order in which the formats occur without the
need to include separators and the %
prefix. Such a formatting argument is
referred to as "order". Second, it allows the user to specify several
format-orders to handle heterogeneous date-time character
representations.
parse_date_time2()
is a fast C parser of numeric orders.
fast_strptime()
is a fast C parser of numeric formats only
that accepts explicit format arguments, just like base::strptime()
.
Usage
parse_date_time(
x,
orders,
tz = "UTC",
truncated = 0,
quiet = FALSE,
locale = Sys.getlocale("LC_TIME"),
select_formats = .select_formats,
exact = FALSE,
train = TRUE,
drop = FALSE
)
parse_date_time2(
x,
orders,
tz = "UTC",
exact = FALSE,
lt = FALSE,
cutoff_2000 = 68L
)
fast_strptime(x, format, tz = "UTC", lt = TRUE, cutoff_2000 = 68L)
Arguments
x |
a character or numeric vector of dates |
orders |
a character vector of date-time formats. Each order string is
a series of formatting characters as listed in |
tz |
a character string that specifies the time zone with which to parse the dates |
truncated |
integer, number of formats that can be missing. The most
common type of irregularity in date-time data is the truncation due to
rounding or unavailability of the time stamp. If the |
quiet |
logical. If |
locale |
locale to be used, see locales. On Linux systems you
can use |
select_formats |
A function to select actual formats for parsing from a
set of formats which matched a training subset of |
exact |
logical. If |
train |
logical, default |
drop |
logical, default |
lt |
logical. If |
cutoff_2000 |
integer. For |
format |
a vector of formats. If multiple formats supplied they are
applied in turn till success. The formats should include all the
separators and each format letter must be prefixed with %, just as in the
format argument of |
Details
When several format-orders are specified, parse_date_time()
selects
(guesses) format-orders based on a training subset of the input
strings. After guessing the formats are ordered according to the performance
on the training set and applied recursively on the entire input vector. You
can disable training with train = FALSE
.
parse_date_time()
, and all derived functions, such as ymd_hms()
,
ymd()
, etc., will drop into fast_strptime()
instead of
base::strptime()
whenever the guessed from the input data formats are all
numeric.
The list below contains formats recognized by lubridate. For numeric
formats leading 0s are optional. As compared to base::strptime()
, some of
the formats are new or have been extended for efficiency reasons. These
formats are marked with "(*)" below. Fast parsers parse_date_time2()
and
fast_strptime()
accept only formats marked with "(!)".
a
Abbreviated weekday name in the current locale. (Also matches full name)
A
Full weekday name in the current locale. (Also matches abbreviated name).
You don't need to specify
a
andA
formats explicitly. Wday is automatically handled ifpreproc_wday = TRUE
b
(!)Abbreviated or full month name in the current locale. The C parser currently understands only English month names.
B
(!)Same as b.
d
(!)Day of the month as decimal number (01–31 or 0–31)
H
(!)Hours as decimal number (00–24 or 0–24).
I
(!)Hours as decimal number (01–12 or 1–12).
j
Day of year as decimal number (001–366 or 1–366).
q
(!*)Quarter (1–4). The quarter month is added to the parsed month if
m
element is present.m
(!*)Month as decimal number (01–12 or 1–12). For
parse_date_time
also matches abbreviated and full months names asb
andB
formats. C parser understands only English month names.M
(!)Minute as decimal number (00–59 or 0–59).
p
(!)AM/PM indicator in the locale. Commonly used in conjunction with
I
and not withH
. But lubridate's C parser accepts H format as long as hour is not greater than 12. C parser understands only English locale AM/PM indicator.S
(!)Second as decimal number (00–61 or 0–61), allowing for up to two leap-seconds (but POSIX-compliant implementations will ignore leap seconds).
OS
Fractional second.
U
Week of the year as decimal number (00–53 or 0–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.
w
Weekday as decimal number (0–6, Sunday is 0).
W
Week of the year as decimal number (00–53 or 0–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention.
y
(!*)Year without century (00–99 or 0–99). In
parse_date_time()
also matches year with century (Y format).Y
(!)Year with century.
z
(!*)ISO8601 signed offset in hours and minutes from UTC. For example
-0800
,-08:00
or-08
, all represent 8 hours behind UTC. This format also matches the Z (Zulu) UTC indicator. Becausebase::strptime()
doesn't fully support ISO8601 this format is implemented as an union of 4 formats: Ou (Z), Oz (-0800), OO (-08:00) and Oo (-08). You can use these formats as any other but it is rarely necessary.parse_date_time2()
andfast_strptime()
support all of these formats.Om
(!*)Matches numeric month and English alphabetic months (Both, long and abbreviated forms).
Op
(!*)Matches AM/PM English indicator.
r
(*)Matches
Ip
andH
orders.R
(*)Matches
HM
andIMp
orders.T
(*)Matches
IMSp
,HMS
, andHMOS
orders.
Value
a vector of POSIXct date-time objects
Note
parse_date_time()
(and the derivatives ymd()
, ymd_hms()
, etc.)
relies on a sparse guesser that takes at most 501 elements from the
supplied character vector in order to identify appropriate formats from
the supplied orders. If you get the error All formats failed to parse
and you are confident that your vector contains valid dates, you should
either set exact
argument to TRUE
or use functions that don't perform
format guessing (fast_strptime()
, parse_date_time2()
or
base::strptime()
).
For performance reasons, when timezone is not UTC,
parse_date_time2()
and fast_strptime()
perform no validity checks for
daylight savings time. Thus, if your input string contains an invalid date
time which falls into DST gap and lt = TRUE
you will get an POSIXlt
object with a non-existent time. If lt = FALSE
your time instant will be
adjusted to a valid time by adding an hour. See examples. If you want to
get NA for invalid date-times use fit_to_timeline()
explicitly.
See Also
base::strptime()
, ymd()
, ymd_hms()
Examples
## ** orders are much easier to write **
x <- c("09-01-01", "09-01-02", "09-01-03")
parse_date_time(x, "ymd")
parse_date_time(x, "y m d")
parse_date_time(x, "%y%m%d")
# "2009-01-01 UTC" "2009-01-02 UTC" "2009-01-03 UTC"
## ** heterogeneous date-times **
x <- c("09-01-01", "090102", "09-01 03", "09-01-03 12:02")
parse_date_time(x, c("ymd", "ymd HM"))
## ** different ymd orders **
x <- c("2009-01-01", "02022010", "02-02-2010")
parse_date_time(x, c("dmY", "ymd"))
## "2009-01-01 UTC" "2010-02-02 UTC" "2010-02-02 UTC"
## ** truncated time-dates **
x <- c("2011-12-31 12:59:59", "2010-01-01 12:11", "2010-01-01 12", "2010-01-01")
parse_date_time(x, "Ymd HMS", truncated = 3)
## ** specifying exact formats and avoiding training and guessing **
parse_date_time(x, c("%m-%d-%y", "%m%d%y", "%m-%d-%y %H:%M"), exact = TRUE)
parse_date_time(c('12/17/1996 04:00:00','4/18/1950 0130'),
c('%m/%d/%Y %I:%M:%S','%m/%d/%Y %H%M'), exact = TRUE)
## ** quarters and partial dates **
parse_date_time(c("2016.2", "2016-04"), orders = "Yq")
parse_date_time(c("2016", "2016-04"), orders = c("Y", "Ym"))
## ** fast parsing **
## Not run:
options(digits.secs = 3)
## random times between 1400 and 3000
tt <- as.character(.POSIXct(runif(1000, -17987443200, 32503680000)))
tt <- rep.int(tt, 1000)
system.time(out <- as.POSIXct(tt, tz = "UTC"))
system.time(out1 <- ymd_hms(tt)) # constant overhead on long vectors
system.time(out2 <- parse_date_time2(tt, "YmdHMOS"))
system.time(out3 <- fast_strptime(tt, "%Y-%m-%d %H:%M:%OS"))
all.equal(out, out1)
all.equal(out, out2)
all.equal(out, out3)
## End(Not run)
## ** how to use `select_formats` argument **
## By default %Y has precedence:
parse_date_time(c("27-09-13", "27-09-2013"), "dmy")
## to give priority to %y format, define your own select_format function:
my_select <- function(trained, drop=FALSE, ...){
n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y", names(trained))*1.5
names(trained[ which.max(n_fmts) ])
}
parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select)
## ** invalid times with "fast" parsing **
parse_date_time("2010-03-14 02:05:06", "YmdHMS", tz = "America/New_York")
parse_date_time2("2010-03-14 02:05:06", "YmdHMS", tz = "America/New_York")
parse_date_time2("2010-03-14 02:05:06", "YmdHMS", tz = "America/New_York", lt = TRUE)