quotesCleanup {highfrequency} | R Documentation |
Cleans quote data
Description
This is a wrapper function for cleaning the quote data in the entire folder dataSource
.
The result is saved in the folder dataDestination
.
In case you supply the argument qDataRaw
, the on-disk functionality is ignored
and the function returns the cleaned quotes as xts
or data.table
object (see examples).
The following cleaning functions are performed sequentially:
noZeroQuotes
, exchangeHoursOnly
, autoSelectExchangeQuotes
or selectExchange
, rmNegativeSpread
, rmLargeSpread
mergeQuotesSameTimestamp
, rmOutliersQuotes
.
Usage
quotesCleanup(
dataSource = NULL,
dataDestination = NULL,
exchanges = "auto",
qDataRaw = NULL,
report = TRUE,
selection = "median",
maxi = 50,
window = 50,
type = "standard",
marketOpen = "09:30:00",
marketClose = "16:00:00",
rmoutliersmaxi = 10,
printExchange = TRUE,
saveAsXTS = FALSE,
tz = NULL
)
Arguments
dataSource |
character indicating the folder in which the original data is stored. |
dataDestination |
character indicating the folder in which the cleaned data is stored. |
exchanges |
vector of stock exchange symbols for all data in dataSource,
e.g.
. The default value is |
qDataRaw |
|
report |
boolean and |
selection |
argument to be passed on to the cleaning routine |
maxi |
spreads which are greater than median spreads of the day times |
window |
argument to be passed on to the cleaning routine |
type |
argument to be passed on to the cleaning routine |
marketOpen |
passed to |
marketClose |
passed to |
rmoutliersmaxi |
argument to be passed on to the cleaning routine |
printExchange |
Argument passed to |
saveAsXTS |
indicates whether data should be saved in |
tz |
fallback time zone used in case we we are unable to identify the timezone of the data, by default: |
Details
Using the on-disk functionality with .csv.zip files which is the standard from the WRDS database will write temporary files on your machine - we try to clean up after it, but cannot guarantee that there won't be files that slip through the crack if the permission settings on your machine does not match ours.
If the input data.table
does not contain a DT
column but it does contain DATE
and TIME_M
columns, we create the DT
column by REFERENCE, altering the data.table
that may be in the user's environment!
Value
The function converts every (compressed) csv (or rds) file in dataSource
into multiple xts
or data.table
files.
In dataDestination
, there will be one folder for each symbol containing .rds files with cleaned data stored either in data.table
or xts
format.
In case you supply the argument qDataRaw
, the on-disk functionality is ignored
and the function returns a list with the cleaned quotes as an xts
or data.table
object depending on input (see examples).
Author(s)
Jonathan Cornelissen, Kris Boudt, Onno Kleen, and Emil Sjoerup.
References
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., and Shephard, N. (2009). Realized kernels in practice: Trades and quotes. Econometrics Journal 12, C1-C32.
Brownlees, C.T. and Gallo, G.M. (2006). Financial econometric analysis at ultra-high frequency: Data handling concerns. Computational Statistics & Data Analysis, 51, pages 2232-2245.
Falkenberry, T.N. (2002). High frequency data filtering. Unpublished technical report.
Examples
# Consider you have raw quote data for 1 stock for 2 days
head(sampleQDataRaw)
dim(sampleQDataRaw)
qDataAfterCleaning <- quotesCleanup(qDataRaw = sampleQDataRaw, exchanges = "N")
qDataAfterCleaning$report
dim(qDataAfterCleaning$qData)
# In case you have more data it is advised to use the on-disk functionality
# via "dataSource" and "dataDestination" arguments