fitkienerX {FatTailsR} | R Documentation |
Estimation and Regression Functions for Kiener Distributions
Description
Several functions to estimate the parameters of asymmetric Kiener distributions
and display the results in a numeric vector or in a matrix.
Algorithm "reg"
(the default) uses a nonlinear regression and handle
difficult cases. Algorithm "estim"
has been completely rewritten
in version 1.8-0 and is now very accurate, even for k<1
. Adjustement
on extreme quantiles can be controlled very precisely.
Usage
fitkienerX(X, algo = c("r", "reg", "e", "estim"), ord = 7, maxk = 10,
mink = 1.53, maxe = 0.5, probak = pprobs2, dgts = NULL,
exfitk = NULL, dimnames = FALSE, ncores = 1)
paramkienerX(X, algo = c("r", "reg", "e", "estim"), ord = 7, maxk = 10,
mink = 1.53, maxe = 0.5, dgts = 3, parnames = TRUE,
dimnames = FALSE, ncores = 1)
paramkienerX7(X, dgts = 3, n = 10, maxk = 20, maxe = 0.9,
parnames = TRUE, dimnames = FALSE, ncores = 1)
paramkienerX5(X, dgts = 3, i = 4, maxk = 20, maxe = 0.9,
parnames = TRUE, dimnames = FALSE, ncores = 1)
Arguments
X |
numeric. Vector, matrix, array or list of quantiles. |
algo |
character. The algorithm used: |
ord |
integer. Option for probability selection and treatment. |
maxk |
numeric. The maximum value of tail parameter |
mink |
numeric. The minimum value of tail parameter |
maxe |
numeric. The maximum value of absolute tail parameter |
probak |
numeric. Ordered vector of probabilities. |
dgts |
integer. The rounding of output parameters. |
exfitk |
character. A vector of parameter names to subset the output. |
dimnames |
boolean. Display dimnames. |
ncores |
integer. The number of cores for parallel processing of arrays. |
parnames |
boolean. Display parameter names. |
n |
integer. The 1:n and (N+i-n):N elements of |
i |
integer. The i-th and (N-i)-th elements of |
Details
FatTailsR package currently uses two different algorithms to estimate the parameters of Kiener distributions K1, K2, K3 and K4.
Functions
fitkienerX(algo = "reg")
,paramkienerX(algo = "reg")
andregkienerLX
use an unweighted nonlinear regression fromlogit(p)
toX
over the whole dataset. Depending the size of the dataset, calculation can be slow but is usually accurate and describes very well the last 1-10 points in the tails (except if there is a huge outlier).Functions
fitkienerX(algo = "estim")
,paramkienerX(algo = "estim")
,paramkienerX5
andparamkienerX7
estimate the parameters with just 5 to 11 quantiles, 5 being the minimum. For averaging purpose, 11 quantiles are proposed (see below). Computation is almost instantaneous and reasonnably accurate. This is the recommanded method for intensive computation.
A typical input is a numeric vector or a matrix that describes the returns of a stock. A matrix must be in the format DS with DATES as rownames, STOCKS as colnames and (log-)returns as the content of the matrix. An array must be in the format DSL with DATES as rownames, STOCKS as colnames LAGS in the third dimension and (log-)returns as the content of the array. A list can be a list of numeric but neither a list of matrix, a list of data.frame or a list of arrays.
Conversion from a (possible) time series format to a sorted numeric vector
is done automatically and without any check of the initial format.
Empirical probabilities of each point in the sorted dataset is calculated
with the function ppoints
whose parameter a
has been set to
a = 0
as large datasets are very common in finance.
The lowest acceptable size of a dataset is not clear at this moment. A minimum
of 11 points has been set in "reg"
algorithm and a minimum of 15 points
has been set in "estim"
algorithm. It might change in the future.
If possible, use at least 21 points.
Parameter algo
controls the algorithm used. Default is "reg".
When algo = "reg"
(or algo = "r"
), a nonlinear regression is performed
with nlsLM
from the logit of the empirical probabilities
logit(p)
over the quantiles X with the function qlkiener4
.
The maximum value of the tail parameter k
is controlled by maxk
.
An upper value maxk = 10
is appropriate for datasets
of low and medium size, less than 20.000 or 50.000 points. For larger datasets, the
upper limit can be extended up to maxk = 20
. When this limit is reached,
the shape of the distribution is very similar to the logistic distribution
(at least when e = 0
) and the use of this distribution should be considered.
Remember that value k < 2
describes a distribution with no stable variance and
k < 1
describes a distribution with no stable mean.
When algo = "estim"
(or algo = "e"
),
5 to 11 quantiles are used to estimate the parameters.
The minimum is 5 quantiles : the median x.50, two quantiles at medium distance
to the median, usually x.25 and x.75 and two quantiles located close to the extremes
of the dataset, for instance x.01 and x.99 if the dataset X
has more
than 100 points, x.0001 and x.9999 if the dataset X
has more than
10.000 points and so on if the dataset is larger.
These quantiles are extracted with function fiveprobs
.
Small datasets must contain at least 15 different points.
With the idea of averaging the results (but without any guarantee of better
estimates), calculation has been extended to 11 probabilities
extracted from X
with the function elevenprobs
where
p1, p2 and p3 are the most extreme probabilities of the dataset X
with values finishing either by .x01
or .x025
or .x05
:
p11 = c(p1, p2, p3, 0.25, 0.35, 0.50, 0.65, 0.75, 1-p3, 1-p2, 1-p1)
Selection of subsets among these 11 probabilities is controlled with the option
ord
which can take 12 different values.
For instance, the default ord = 7
computes the parameters at probabilities
c(p1, 0.25, 0.50, 0.75, 1-p1)
and c(p2, 0.25, 0.50, 0.75, 1-p2)
.
Parameters d
and k
are averaged first and the results of these
averages are used to compute the other parameters g, a, w, e
.
Small dataset should consider ord = 5
and
large dataset can consider ord = 12
.
The 12 possible values of ord
are:
-
c(p1, 0.35, 0.50, 0.65, 1-p1)
-
c(p2, 0.35, 0.50, 0.65, 1-p2)
-
c(p1, p2, 0.35, 0.50, 0.65, 1-p2, 1-p1)
-
c(p1, p2, p3, 0.35, 0.50, 0.65, 1-p3, 1-p2, 1-p1)
-
c(p1, 0.25, 0.50, 0.75, 1-p1)
-
c(p2, 0.25, 0.50, 0.75, 1-p2)
-
c(p1, p2, 0.25, 0.50, 0.75, 1-p2, 1-p1)
-
c(p1, p2, p3, 0.25, 0.50, 0.75, 1-p3, 1-p2, 1-p1)
-
c(p1, 0.25, 0.35, 0.50, 0.65, 0.75, 1-p1)
-
c(p2, 0.25, 0.35, 0.50, 0.65, 0.75, 1-p2)
-
c(p1, p2, 0.25, 0.35, 0.50, 0.65, 0.75, 1-p2, 1-p1)
-
c(p1, p2, p3, 0.25, 0.35, 0.50, 0.65, 0.75, 1-p3, 1-p2, 1-p1)
paramkienerX5
is a simplified version of paramkienerX
with
predefined values algo = "estim"
, ord = 5
, maxk = 10
and direct access to internal subfunctions.
It uses the following probabilities:
-
p5 = c(p1, 0.25, 0.50, 0.75, 1-p1)
paramkienerX7
is a simplified version of paramkienerX
with
predefined values algo = "estim"
, ord = 7
, maxk = 10
and direct access to internal subfunctions.
It uses the following probabilities:
-
p7 = c(p1, p2, 0.25, 0.50, 0.75, 1-p2, 1-p1)
The quantiles corresponding to the above probabilities are then extracted
with the function quantile
whose parameter type
has been set to type = 6
as it returns the closest values
to the true quantiles (according to our experience) for all k > 1.9
.
(Note: when k < 1.5
, algorithm algo = "reg"
returns better
results).
Both probabilities and quantiles are then transfered to estimkiener11
for calculation.
probak
controls the probabilities at which the model is tested with the parameter
estimates. fitkienerX
and regkienerLX
share the same subroutines.
The default for fitkienerX
and regkienerLX
is
pprobs2 = c(0.01, 0.025, 0.05, 0.95, 0.975, 0.99)
as those values
are usual in finance. Other sets of values are provided at pprobs0
.
Rounding the results is useful to display nice results, especially
in a matrix or in a data.frame. dgts = 13
is recommanded
as a
, k
, w
are usually significant at 1 digit.
-
dgts = NULL
does not perform any rounding. -
dgts = 0 to 9
rounds all parameters at the same level. -
dgts = 10 to 27
rounds the parameters at various levels for nice display. Seeroundcoefk
for the details. (Note: the rounding10 to 27
currently works withparamkienerX
,paramkienerX5
,paramkienerX7
but not yet withfitkienerX
).
Extracting the most useful parameters from the (quite long) vector/matrix
fitk
is controlled by parameter exfitk
that calls user-defined or
predefined parameter subsets like exfit0
, ..., exfit7
.
IMPORTANT: never subset fitk
by rank number as new items may be added
in the future and rank may vary.
Calculation of vectors, matrices and lists is not parallelized. Parallelization
of code for arrays was introduced in version 1.5-0 and improved in version 1.5-1.
ncores
controls the number of cores allowed to the process (through
parApply
which runs on Unices and Windows and requires
about 2 seconds to start). ncores = 1
means no parallelization.
ncores = 0
is the recommanded option. It uses the maximum number of cores
available on the computer, as detected by detectCores
,
minus 1 core, which gives the best performance in most cases.
Although appealing, this automatic selection may be sometimes dangerous. For instance,
the instruction f(X, ncores_max) - f(X, ncores_max)
, a nice way to compute
an array of 0, will call 2 ncores_max
and crash R. ncores = 2,..,99
sets manually the number of cores. If the requested value is larger than the maximum
number of cores, this value is automatically reduced (with a warning) to this maximum.
Hence, this latest option provides one core more than option ncores = 0
.
NOTE: fitkienerLX
, regkienerX
, estimkiener(X,5,7)
were
introduced in v1.2-0 and replaced in version v1.4-1 by fitkienerX
and
paramkiener(X,5,7)
to accomodate vector, matrix, arrays and lists.
We apologize to early users who need to rewrite their codes.
Value
paramkienerX
: a vector (or a matrix) of parameter estimates
c(m, g, a, k, w, d, e)
.
fitkienerX
: a vector (or a matrix) made of several parts:
-
ret
: the return over the period calculated withsum(x)
. Thus, assume log-returns. -
m, g, a, k, w, d, e
: the parameter estimates. -
m1, sd, sk, ke
: the mean, standard deviation, skewness and excess of kurtosis computed from the parameter estimates. -
m1x, sdx, skx, kex
: The mean, standard deviation, skewness and excess of kurtosis computed from the dataset. -
lh
: the length of the dataset over the period. -
q.
: quantile estimated with the parameter estimates. -
VaR.
: Value-at-Risk, positive in most cases. -
c.
: corrective tail coefficient = (q - m) / (q_logistic_function - m). -
ltm.
: left tail mean (signed ES on the left tail, usually negative). -
rtm.
: right tail mean (signed ES on the right tail, usually positive). -
dtmq.
: (p<=0.5 left, p>0.5 right) tail mean minus quantile. -
ES.
: expected shortfall, positive in most cases. -
h.
: corrective ES = (ES - m) / (ES_logistic_function - m). -
desv.
: ES - VaR, usually positive. -
l.
: quantile estimated by the tangent logistic function. -
dl.
: quantile - quantile_logistic_function. -
g.
: quantile estimated by the Laplace-Gauss function. -
dg.
: quantile - quantile_Laplace_Gauss_function.
IMPORTANT : if you need to subset fitk
, always subset it by parameter names
and never subset it by rank number as new items may be added in the future and rank may vary.
Use for instance exfit0
, ..., exfit7
.
References
P. Kiener, Fat tail analysis and package FatTailsR, 9th R/Rmetrics Workshop and Summer School, Zurich, 27 June 2015. https://www.inmodelia.com/exemples/2015-0627-Rmetrics-Kiener-en.pdf
See Also
regkienerLX
, estimkiener11
,
roundcoefk
, exfit6
.
Examples
require(minpack.lm)
require(timeSeries)
### Load the datasets and choose j in 1:16
DS <- getDSdata()
j <- 5
### and run this block
probak <- c(0.01, 0.05, 0.95, 0.99)
X <- DS[[j]] ; names(DS)[j]
elevenprobs(X)
fitkienerX(X, algo = "reg", dgts = 3, probak = probak)
fitkienerX(X, algo = "estim", ord = 5, probak = probak, dgts = 3)
paramkienerX(X)
paramkienerX5(X)
### Compare the 12 values of paramkienerX(ord/row = 1:12) and paramkienerX (row 13)
compare <- function(ord, X) { paramkienerX(X, ord, algo = "estim", dgts = 13) }
rbind(t(sapply( 1:12, compare, X)), paramkienerX(X, algo = "reg", dgts = 13))
### Analyze DS in one step
t(sapply(DS, paramkienerX, algo = "reg", dgts = 13))
t(sapply(DS, paramkienerX, algo = "estim", dgts = 13))
paramkienerX(DS, algo = "reg", dgts = 13)
paramkienerX(DS, algo = "estim", dgts = 13)
system.time(fitk_rDS <- fitkienerX(DS, algo = "r", probak = pprobs2, dgts = 3))
system.time(fitk_eDS <- fitkienerX(DS, algo = "e", probak = pprobs2, dgts = 3))
fitk_rDS
fitk_eDS
### Subset rDS and eDS with exfit0,..,exfit7
fitk_rDS[,exfit4]
fitk_eDS[,exfit7]
fitkienerX(DS, algo = "e", probak = pprobs2, dgts = 3, exfitk = exfit7)
### Array (new example introduced in v1.5-1)
### Increase the number of cores and crash R.
## Not run:
arr <- array(rkiener1(3000), c(4,3,250))
paramkienerX7(arr, ncores = 2)
## paramkienerX7(arr, ncores = 2) - paramkienerX(arr, ncores = 2)
## End(Not run)
### End