[.xts {xts} | R Documentation |
Extract Subsets of xts Objects
Description
Details on efficient subsetting of xts objects for maximum performance and compatibility.
Usage
## S3 method for class 'xts'
x[i, j, drop = FALSE, which.i = FALSE, ...]
Arguments
x |
An xts object. |
i |
The rows to extract. Can be a numeric vector, time-based vector, or an ISO-8601 style range string (see details). |
j |
The columns to extract, either a numeric vector of column locations or a character vector of column names. |
drop |
Should dimension be dropped, if possible? See notes section. |
which.i |
Logical value that determines whether a subset xts object is
returned (the default), or the locations of the matching rows (when
|
... |
Additional arguments (currently unused). |
Details
One of the primary motivations and key points of differentiation of xts is the ability to subset rows by specifying ISO-8601 compatible range strings. This allows for natural range-based time queries without requiring prior knowledge of the underlying class used for the time index.
When i
is a character string, it is processed as an ISO-8601 formatted
datetime or time range using .parseISO8601()
. A single datetime is
parsed from left to to right, according to the following specification:
CCYYMMDD HH:MM:SS.ss+
A time range can be specified by two datetimes separated by a forward slash or double-colon. For example:
CCYYMMDD HH:MM:SS.ss+/CCYYMMDD HH:MM:SS.ss
The ISO8601 time range subsetting uses a custom binary search algorithm to
efficiently find the beginning and end of the time range. i
can also be a
vector of ISO8601 time ranges, which enables subsetting by multiple
non-contiguous time ranges in one subset call.
The above parsing, both for single datetimes and time ranges, will be done
on each element when i
is a character vector. This is very inefficient,
especially for long vectors. In this case, it's recommened to use I(i)
so
the xts subset function can process the vector more efficiently. Another
alternative is to convert i
to POSIXct before passing it to the subset
function. See the examples for an illustration of using I(i)
.
The xts index is stored as POSIXct internally, regardless of the value of
its tclass
attribute. So the fastest time-based subsetting is always when
i
is a POSIXct vector.
Value
An xts object containing the subset of x
. When which.i = TRUE
,
the corresponding integer locations of the matching rows is returned.
Note
By design, xts objects always have two dimensions. They cannot be
vectors like zoo objects. Therefore drop = FALSE
by default in order to
preserve the xts object's dimensions. This is different from both matrix and
zoo, which use drop = TRUE
by default. Explicitly setting drop = TRUE
may be needed when performing certain matrix operations.
Author(s)
Jeffrey A. Ryan
References
ISO 8601: Date elements and interchange formats - Information interchange - Representation of dates and time https://www.iso.org
See Also
xts()
, .parseISO8601()
, .index()
Examples
x <- xts(1:3, Sys.Date()+1:3)
xx <- cbind(x,x)
# drop = FALSE for xts, differs from zoo and matrix
z <- as.zoo(xx)
z/z[,1]
m <- as.matrix(xx)
m/m[,1]
# this will fail with non-conformable arrays (both retain dim)
tryCatch(
xx/x[,1],
error = function(e) print("need to set drop = TRUE")
)
# correct way
xx/xx[,1,drop = TRUE]
# or less efficiently
xx/drop(xx[,1])
# likewise
xx/coredata(xx)[,1]
x <- xts(1:1000, as.Date("2000-01-01")+1:1000)
y <- xts(1:1000, as.POSIXct(format(as.Date("2000-01-01")+1:1000)))
x.subset <- index(x)[1:20]
x[x.subset] # by original index type
system.time(x[x.subset])
x[as.character(x.subset)] # by character string. Beware!
system.time(x[as.character(x.subset)]) # slow!
system.time(x[I(as.character(x.subset))]) # wrapped with I(), faster!
x['200001'] # January 2000
x['1999/2000'] # All of 2000 (note there is no need to use the exact start)
x['1999/200001'] # January 2000
x['2000/200005'] # 2000-01 to 2000-05
x['2000/2000-04-01'] # through April 01, 2000
y['2000/2000-04-01'] # through April 01, 2000 (using POSIXct series)
### Time of day subsetting
i <- 0:60000
focal_date <- as.numeric(as.POSIXct("2018-02-01", tz = "UTC"))
x <- .xts(i, c(focal_date + i * 15), tz = "UTC", dimnames = list(NULL, "value"))
# Select all observations between 9am and 15:59:59.99999:
w1 <- x["T09/T15"] # or x["T9/T15"]
head(w1)
# timestring is of the form THH:MM:SS.ss/THH:MM:SS.ss
# Select all observations between 13:00:00 and 13:59:59.9999 in two ways:
y1 <- x["T13/T13"]
head(y1)
x[.indexhour(x) == 13]
# Select all observations between 9:30am and 30 seconds, and 4.10pm:
x["T09:30:30/T16:10"]
# It is possible to subset time of day overnight.
# e.g. This is useful for subsetting FX time series which trade 24 hours on week days
# Select all observations between 23:50 and 00:15 the following day, in the xts time zone
z <- x["T23:50/T00:14"]
z["2018-02-10 12:00/"] # check the last day
# Select all observations between 7pm and 8.30am the following day:
z2 <- x["T19:00/T08:29:59"]
head(z2); tail(z2)