trade_classification {PINstimation} | R Documentation |
Classification and aggregation of high-frequency data
Description
classify_trades()
classifies high-frequency trading data into
buyer-initiated and seller-initiated trades using different algorithms, and
different time lags.
aggregate_trades()
aggregates high-frequency trading data into aggregated
data for provided frequency of aggregation. The aggregation is preceded by
a trade classification step which classifies trades using different trade
classification algorithms and time lags.
Usage
classify_trades(data, algorithm = "Tick", timelag = 0, ..., verbose = TRUE)
aggregate_trades(
data,
algorithm = "Tick",
timelag = 0,
frequency = "day",
unit = 1,
...,
verbose = TRUE
)
Arguments
data |
A dataframe with 4 variables in the following
order ( |
algorithm |
A character string refers to the algorithm used
to determine the trade initiator, a buyer or a seller. It takes one of four
values ( |
timelag |
A number referring to the time lag in milliseconds
used to calculate the lagged midquote, bid and ask for the algorithms
|
... |
Additional arguments passed on to the functions
|
verbose |
A binary variable that determines whether detailed
information about the progress of the trade classification is displayed.
No output is produced when |
frequency |
The frequency used to aggregate intraday data. It takes one
of the following values: |
unit |
An integer referring to the size of the aggregation window
used to aggregate intraday data. The default value is |
Details
The argument algorithm
takes one of four values:
-
"Tick"
refers to the tick algorithm: Trade is classified as a buy (sell) if the price of the trade to be classified is above (below) the closest different price of a previous trade. -
"Quote"
refers to the quote algorithm: it classifies a trade as a buy (sell) if the trade price of the trade to be classified is above (below) the mid-point of the bid and ask spread. Trades executed at the mid-spread are not classified. -
"LR"
refers toLR
algorithm as in Lee and Ready (1991). It classifies a trade as a buy (sell) if its price is above (below) the mid-spread (quote algorithm), and uses the tick algorithm if the trade price is at the mid-spread. -
"EMO"
refers toEMO
algorithm as in Ellis et al. (2000). It classifies trades at the bid (ask) as sells (buys) and uses the tick algorithm to classify trades within the then prevailing bid-ask spread.
LR
recommend the use of mid-spread five-seconds earlier ('5-second'
rule) mitigating trade misclassifications for many of the 150
NYSE stocks they analyze. On the other hand, in more recent studies such
as Piwowar and Wei (2006) and
Aktas and Kryzanowski (2014), the use of
1-second lagged midquotes are shown to yield lower rates of
misclassifications. The default value is set to 0
seconds (no time-lag).
Considering the ultra-fast nature of today’s financial markets, time-lag
is in the unit of milliseconds. Shorter than 1-second lags can also be
implemented by entering values such as 100
or 500
.
Value
The function classify_trades() returns a dataframe of five variables. The
first four variables are obtained from the argument data
: timestamp
,
price
, bid
, ask
. The fifth variable is isbuy
, which takes the value
TRUE
, when the trade is classified as a buyer-initiated trade, and FALSE
when the trade is classified as a seller-initiated trade.
The function aggregate_trades() returns a dataframe of two
(or three) variables. If fullreport
is set to TRUE
, then
the returned dataframe has three variables {freq, b, s}
. If
fullreport
is set to FALSE
, then the returned dataframe has
two variables {b, s}
, and, therefore, can be #'directly used for the
estimation of the PIN
and MPIN
models.
References
Aktas OU, Kryzanowski L (2014).
“Trade classification accuracy for the BIST.”
Journal of International Financial Markets, Institutions and Money, 33, 259-282.
ISSN 1042-4431.
Ellis K, Michaely R, Ohara M (2000).
“The Accuracy of Trade Classification Rules: Evidence from Nasdaq.”
The Journal of Financial and Quantitative Analysis, 35(4), 529–551.
Lee CMC, Ready MJ (1991).
“Inferring Trade Direction from Intraday Data.”
The Journal of Finance, 46(2), 733–746.
ISSN 00221082, 15406261.
Piwowar MS, Wei L (2006).
“The Sensitivity of Effective Spread Estimates to Trade-Quote Matching Algorithms.”
Electronic Markets, 16(2), 112-129.
Examples
# There is a preloaded dataset called 'hfdata' contained in the package.
# It is an artificially created high-frequency trading data. The dataset
# contains 100 000 trades and five variables 'timestamp', 'price',
# 'volume', 'bid', and 'ask'. For more information, type ?hfdata.
xdata <- hfdata
xdata$volume <- NULL
# Use the EMO algorithm with a timelag of 500 milliseconds to classify
# high-frequency trades in the dataset 'xdata'
ctrades <- classify_trades(xdata, algorithm = "EMO", timelag = 500, verbose = FALSE)
# Use the LR algorithm with a timelag of 1 second to aggregate intraday data
# in the dataset 'xdata' at a frequency of 15 minutes.
lrtrades <- aggregate_trades(xdata, algorithm = "LR", timelag = 1000,
frequency = "min", unit = 15, verbose = FALSE)
# Use the Quote algorithm with a timelag of 1 second to aggregate intraday data
# in the dataset 'xdata' at a daily frequency.
qtrades <- aggregate_trades(xdata, algorithm = "Quote", timelag = 1000,
frequency = "day", unit = 1, verbose = FALSE)
# Since the argument 'fullreport' is set to FALSE by default, then the
# output 'qtrades' can be used directly for the estimation of the PIN
# model, namely using pin_ea().
estimate <- pin_ea(qtrades, verbose = FALSE)
# Show the estimate
show(estimate)