PINstimation-package {PINstimation}R Documentation

An R package for estimating the probability of informed trading

Description

The package provides utilities for the estimation of probability of informed trading measures: original PIN (PIN) as introduced by Easley and Ohara (1992) and Easley et al. (1996) , multilayer PIN (MPIN) as introduced by Ersan (2016), adjusted PIN (AdjPIN) model as introduced in Duarte and Young (2009), and volume-synchronized PIN (VPIN) as introduced by Easley et al. (2011) and Easley et al. (2012). Estimations of PIN, MPIN, and adjPIN are subject to floating-point exception error, and are sensitive to the choice of initial values. Therefore, researchers developed factorizations of the model likelihood functions as well as algorithms for determining initial parameter sets for the maximum likelihood estimation - (MLE henceforth).


As for the factorizations, the package includes three different factorizations of the PIN likelihood function :fact_pin_eho() as in Easley et al. (2010), fact_pin_lk() as in Lin and Ke (2011), and fact_pin_e() as in Ersan (2016); one factorization for MPIN likelihood function: fact_mpin() as in Ersan (2016); and one factorization for AdjPIN likelihood function: fact_adjpin() as in Ersan and Ghachem (2022b).

The package implements three algorithms to generate initial parameter sets for the MLE of the PIN model in: initials_pin_yz() for the algorithm of Yan and Zhang (2012), initials_pin_gwj() for the algorithm of Gan et al. (2015), and initials_pin_ea() for the algorithm of Ersan and Alici (2016). As for the initial parameter sets for the MLE of the MPIN model, the function initials_mpin() implements a multilayer extension of the algorithm of Ersan and Alici (2016). Finally, three functions implement three algorithms of initial parameter sets for the MLE of the AdjPIN model, namely initials_adjpin() for the algorithm in Ersan and Ghachem (2022b), initials_adjpin_cl() for the algorithm of Cheng and Lai (2021); and initials_adjpin_rnd() for randomly generated initial parameter sets. The choice of the initial parameter sets can be done directly, either using specific functions implementing MLE for the PIN model, such as, pin_yz(), pin_gwj(), pin_ea(); or through the argument initialsets in generic functions implementing MLE for the MPIN and AdjPIN models, namely mpin_ml(), and adjpin(). Besides, PIN, MPIN and AdjPIN models can be estimated using custom initial parameter set(s) provided by the user and fed through the argument initialsets for the functions pin(), mpin_ml() and adjpin(). Through the function get_posteriors(), the package also allows users to assign, for each day in the sample, the posterior probability that the day is a no-information day, good-information day and bad-information day.

As an alternative to the standard maximum likelihood estimation, estimation via expectation conditional maximization algorithm (ECM) is suggested in Ghachem and Ersan (2022a), and is implemented through the function mpin_ecm() for the MPIN model, and the function adjpin() for the AdjPIN model.

Dataset(s) of daily aggregated numbers of buys and sells with user determined number of information layers can be simulated with the function generatedata_mpin() for the MPIN (PIN) model; and generatedata_adjpin() for the AdjPIN model. The output of these functions contains the theoretical parameters used in the data generation, empirical parameters computed from the generated data, alongside the generated data itself. Data simulation functions allow for broad customization to produce data that fit the user's preferences. Therefore, simulated data series can be utilized in comparative analyses for the applied methods in different scenarios. Alternatively, the user can use two example datasets preloaded in the package: dailytrades as a representative of a quarterly trade data with daily buys and sells; and hfdata as a simulated high-frequency dataset comprising ⁠100 000⁠ trades.

Finally, the package provides two functions to deal with high-frequency data. First, the function vpin() estimates and provides detailed output on the order flow toxicity metric, volume-synchronized probability of informed trading, as developed in Easley et al. (2011) and Easley et al. (2012). Second, the function aggregate_trades() aggregates the high-frequency trade-data into daily data using several trade classification algorithms, namely the tick algorithm, the quote algorithm, LR algorithm (Lee and Ready 1991) and the EMO algorithm (Ellis et al. 2000).

The package provides fast, compact, and precise utilities to tackle the sophisticated, error-prone, and time-consuming estimation procedure of informed trading, and this solely using the raw trade-level data. Ghachem and Ersan (2022b) provides comprehensive overview of the package: it first details the underlying theoretical background, provides a thorough description of the functions, before using them to tackle relevant research questions.

Functions

Datasets

Estimation results

Data simulation

Author(s)

Montasser Ghachem montasser.ghachem@pinstimation.com
Department of Economics at Stockholm University, Stockholm, Sweden.

Oguz Ersan oguz.ersan@pinstimation.com
Department of International Trade and Finance at Kadir Has University, Istanbul, Turkey.

References

Cheng T, Lai H (2021). “Improvements in estimating the probability of informed trading models.” Quantitative Finance, 21(5), 771-796.

Duarte J, Young L (2009). “Why is PIN priced?” Journal of Financial Economics, 91(2), 119–138. ISSN 0304405X.

Easley D, De Prado MML, Ohara M (2011). “The microstructure of the \"flash crash\": flow toxicity, liquidity crashes, and the probability of informed trading.” The Journal of Portfolio Management, 37(2), 118–128.

Easley D, Hvidkjaer S, Ohara M (2010). “Factoring information into returns.” Journal of Financial and Quantitative Analysis, 45(2), 293–309. ISSN 00221090.

Easley D, Kiefer NM, Ohara M, Paperman JB (1996). “Liquidity, information, and infrequently traded stocks.” Journal of Finance, 51(4), 1405–1436. ISSN 00221082.

Easley D, Lopez De Prado MM, OHara M (2012). “Flow toxicity and liquidity in a high-frequency world.” Review of Financial Studies, 25(5), 1457–1493. ISSN 08939454.

Easley D, Ohara M (1992). “Time and the Process of Security Price Adjustment.” The Journal of Finance, 47(2), 577–605. ISSN 15406261.

Ellis K, Michaely R, Ohara M (2000). “The Accuracy of Trade Classification Rules: Evidence from Nasdaq.” The Journal of Financial and Quantitative Analysis, 35(4), 529–551.

Ersan O (2016). “Multilayer Probability of Informed Trading.” Available at SSRN 2874420.

Ersan O, Alici A (2016). “An unbiased computation methodology for estimating the probability of informed trading (PIN).” Journal of International Financial Markets, Institutions and Money, 43, 74–94. ISSN 10424431.

Ersan O, Ghachem M (2022a). “Identifying information types in probability of informed trading (PIN) models: An improved algorithm.” Available at SSRN 4117956.

Ersan O, Ghachem M (2022b). “A methodological approach to the computational problems in the estimation of adjusted PIN model.” Available at SSRN 4117954.

Gan Q, Wei WC, Johnstone D (2015). “A faster estimation method for the probability of informed trading using hierarchical agglomerative clustering.” Quantitative Finance, 15(11), 1805–1821.

Ghachem M, Ersan O (2022a). “Estimation of the probability of informed trading models via an expectation-conditional maximization algorithm.” Available at SSRN 4117952.

Ghachem M, Ersan O (2022b). “PINstimation: An R package for estimating models of probability of informed trading.” Available at SSRN 4117946.

Griffin J, Oberoi J, Oduro SD (2021). “Estimating the probability of informed trading: A Bayesian approach.” Journal of Banking & Finance, 125, 106045.

Lee CMC, Ready MJ (1991). “Inferring Trade Direction from Intraday Data.” The Journal of Finance, 46(2), 733–746. ISSN 00221082, 15406261.

Lin H, Ke W (2011). “A computing bias in estimating the probability of informed trading.” Journal of Financial Markets, 14(4), 625-640. ISSN 1386-4181.

Yan Y, Zhang S (2012). “An improved estimation method and empirical properties of the probability of informed trading.” Journal of Banking and Finance, 36(2), 454–467. ISSN 03784266.


[Package PINstimation version 0.1.2 Index]