R: Estimation and Regression Functions for Kiener Distributions

fitkienerX {FatTailsR}

R Documentation

Estimation and Regression Functions for Kiener Distributions

Description

Several functions to estimate the parameters of asymmetric Kiener distributions and display the results in a numeric vector or in a matrix. Algorithm "reg" (the default) uses a nonlinear regression and handle difficult cases. Algorithm "estim" has been completely rewritten in version 1.8-0 and is now very accurate, even for k<1. Adjustement on extreme quantiles can be controlled very precisely.

Usage

fitkienerX(X, algo = c("r", "reg", "e", "estim"), ord = 7, maxk = 10,
  mink = 1.53, maxe = 0.5, probak = pprobs2, dgts = NULL,
  exfitk = NULL, dimnames = FALSE, ncores = 1)

paramkienerX(X, algo = c("r", "reg", "e", "estim"), ord = 7, maxk = 10,
  mink = 1.53, maxe = 0.5, dgts = 3, parnames = TRUE,
  dimnames = FALSE, ncores = 1)

paramkienerX7(X, dgts = 3, n = 10, maxk = 20, maxe = 0.9,
  parnames = TRUE, dimnames = FALSE, ncores = 1)

paramkienerX5(X, dgts = 3, i = 4, maxk = 20, maxe = 0.9,
  parnames = TRUE, dimnames = FALSE, ncores = 1)

Arguments

`X`	numeric. Vector, matrix, array or list of quantiles.
`algo`	character. The algorithm used: `"r"` or `"reg"` for regression (default) and `"e"` or `"estim"` for quantile estimation.
`ord`	integer. Option for probability selection and treatment.
`maxk`	numeric. The maximum value of tail parameter `k`.
`mink`	numeric. The minimum value of tail parameter `k`.
`maxe`	numeric. The maximum value of absolute tail parameter `\|e\|`.
`probak`	numeric. Ordered vector of probabilities.
`dgts`	integer. The rounding of output parameters.
`exfitk`	character. A vector of parameter names to subset the output.
`dimnames`	boolean. Display dimnames.
`ncores`	integer. The number of cores for parallel processing of arrays.
`parnames`	boolean. Display parameter names.
`n`	integer. The 1:n and (N+i-n):N elements of `X` used to calculate synthetic quantiles at probability levels p1 and 1-p1.
`i`	integer. The i-th and (N-i)-th elements of `X` used to extract probabilities p1 and 1-p1 and quantiles x(p) and x(1-p).

Details

FatTailsR package currently uses two different algorithms to estimate the parameters of Kiener distributions K1, K2, K3 and K4.

Functions fitkienerX(algo = "reg"), paramkienerX(algo = "reg") and regkienerLX use an unweighted nonlinear regression from logit(p) to X over the whole dataset. Depending the size of the dataset, calculation can be slow but is usually accurate and describes very well the last 1-10 points in the tails (except if there is a huge outlier).
Functions fitkienerX(algo = "estim"), paramkienerX(algo = "estim"), paramkienerX5 and paramkienerX7 estimate the parameters with just 5 to 11 quantiles, 5 being the minimum. For averaging purpose, 11 quantiles are proposed (see below). Computation is almost instantaneous and reasonnably accurate. This is the recommanded method for intensive computation.

A typical input is a numeric vector or a matrix that describes the returns of a stock. A matrix must be in the format DS with DATES as rownames, STOCKS as colnames and (log-)returns as the content of the matrix. An array must be in the format DSL with DATES as rownames, STOCKS as colnames LAGS in the third dimension and (log-)returns as the content of the array. A list can be a list of numeric but neither a list of matrix, a list of data.frame or a list of arrays.

Conversion from a (possible) time series format to a sorted numeric vector is done automatically and without any check of the initial format. Empirical probabilities of each point in the sorted dataset is calculated with the function ppoints whose parameter a has been set to a = 0 as large datasets are very common in finance. The lowest acceptable size of a dataset is not clear at this moment. A minimum of 11 points has been set in "reg" algorithm and a minimum of 15 points has been set in "estim" algorithm. It might change in the future. If possible, use at least 21 points.

Parameter algo controls the algorithm used. Default is "reg".

When algo = "reg" (or algo = "r"), a nonlinear regression is performed with nlsLM from the logit of the empirical probabilities logit(p) over the quantiles X with the function qlkiener4. The maximum value of the tail parameter k is controlled by maxk. An upper value maxk = 10 is appropriate for datasets of low and medium size, less than 20.000 or 50.000 points. For larger datasets, the upper limit can be extended up to maxk = 20. When this limit is reached, the shape of the distribution is very similar to the logistic distribution (at least when e = 0) and the use of this distribution should be considered. Remember that value k < 2 describes a distribution with no stable variance and k < 1 describes a distribution with no stable mean.

When algo = "estim" (or algo = "e"), 5 to 11 quantiles are used to estimate the parameters. The minimum is 5 quantiles : the median x.50, two quantiles at medium distance to the median, usually x.25 and x.75 and two quantiles located close to the extremes of the dataset, for instance x.01 and x.99 if the dataset X has more than 100 points, x.0001 and x.9999 if the dataset X has more than 10.000 points and so on if the dataset is larger. These quantiles are extracted with function fiveprobs. Small datasets must contain at least 15 different points.

With the idea of averaging the results (but without any guarantee of better estimates), calculation has been extended to 11 probabilities extracted from X with the function elevenprobs where p1, p2 and p3 are the most extreme probabilities of the dataset X with values finishing either by .x01 or .x025 or .x05:

p11 = c(p1, p2, p3, 0.25, 0.35, 0.50, 0.65, 0.75, 1-p3, 1-p2, 1-p1)

Selection of subsets among these 11 probabilities is controlled with the option ord which can take 12 different values. For instance, the default ord = 7 computes the parameters at probabilities c(p1, 0.25, 0.50, 0.75, 1-p1) and c(p2, 0.25, 0.50, 0.75, 1-p2). Parameters d and k are averaged first and the results of these averages are used to compute the other parameters g, a, w, e. Small dataset should consider ord = 5 and large dataset can consider ord = 12. The 12 possible values of ord are:

c(p1, 0.35, 0.50, 0.65, 1-p1)
c(p2, 0.35, 0.50, 0.65, 1-p2)
c(p1, p2, 0.35, 0.50, 0.65, 1-p2, 1-p1)
c(p1, p2, p3, 0.35, 0.50, 0.65, 1-p3, 1-p2, 1-p1)
c(p1, 0.25, 0.50, 0.75, 1-p1)
c(p2, 0.25, 0.50, 0.75, 1-p2)
c(p1, p2, 0.25, 0.50, 0.75, 1-p2, 1-p1)
c(p1, p2, p3, 0.25, 0.50, 0.75, 1-p3, 1-p2, 1-p1)
c(p1, 0.25, 0.35, 0.50, 0.65, 0.75, 1-p1)
c(p2, 0.25, 0.35, 0.50, 0.65, 0.75, 1-p2)
c(p1, p2, 0.25, 0.35, 0.50, 0.65, 0.75, 1-p2, 1-p1)
c(p1, p2, p3, 0.25, 0.35, 0.50, 0.65, 0.75, 1-p3, 1-p2, 1-p1)

paramkienerX5 is a simplified version of paramkienerX with predefined values algo = "estim", ord = 5, maxk = 10 and direct access to internal subfunctions. It uses the following probabilities:

p5 = c(p1, 0.25, 0.50, 0.75, 1-p1)

paramkienerX7 is a simplified version of paramkienerX with predefined values algo = "estim", ord = 7, maxk = 10 and direct access to internal subfunctions. It uses the following probabilities:

p7 = c(p1, p2, 0.25, 0.50, 0.75, 1-p2, 1-p1)

The quantiles corresponding to the above probabilities are then extracted with the function quantile whose parameter type has been set to type = 6 as it returns the closest values to the true quantiles (according to our experience) for all k > 1.9. (Note: when k < 1.5, algorithm algo = "reg" returns better results). Both probabilities and quantiles are then transfered to estimkiener11 for calculation.

probak controls the probabilities at which the model is tested with the parameter estimates. fitkienerX and regkienerLX share the same subroutines. The default for fitkienerX and regkienerLX is pprobs2 = c(0.01, 0.025, 0.05, 0.95, 0.975, 0.99) as those values are usual in finance. Other sets of values are provided at pprobs0.

Rounding the results is useful to display nice results, especially in a matrix or in a data.frame. dgts = 13 is recommanded as a, k, w are usually significant at 1 digit.

dgts = NULL does not perform any rounding.
dgts = 0 to 9 rounds all parameters at the same level.
dgts = 10 to 27 rounds the parameters at various levels for nice display. See roundcoefk for the details. (Note: the rounding 10 to 27 currently works with paramkienerX, paramkienerX5, paramkienerX7 but not yet with fitkienerX).

Extracting the most useful parameters from the (quite long) vector/matrix fitk is controlled by parameter exfitk that calls user-defined or predefined parameter subsets like exfit0, ..., exfit7. IMPORTANT: never subset fitk by rank number as new items may be added in the future and rank may vary.

Calculation of vectors, matrices and lists is not parallelized. Parallelization of code for arrays was introduced in version 1.5-0 and improved in version 1.5-1. ncores controls the number of cores allowed to the process (through parApply which runs on Unices and Windows and requires about 2 seconds to start). ncores = 1 means no parallelization. ncores = 0 is the recommanded option. It uses the maximum number of cores available on the computer, as detected by detectCores, minus 1 core, which gives the best performance in most cases. Although appealing, this automatic selection may be sometimes dangerous. For instance, the instruction f(X, ncores_max) - f(X, ncores_max), a nice way to compute an array of 0, will call 2 ncores_max and crash R. ncores = 2,..,99 sets manually the number of cores. If the requested value is larger than the maximum number of cores, this value is automatically reduced (with a warning) to this maximum. Hence, this latest option provides one core more than option ncores = 0.

NOTE: fitkienerLX, regkienerX, estimkiener(X,5,7) were introduced in v1.2-0 and replaced in version v1.4-1 by fitkienerX and paramkiener(X,5,7) to accomodate vector, matrix, arrays and lists. We apologize to early users who need to rewrite their codes.

Value

paramkienerX: a vector (or a matrix) of parameter estimates c(m, g, a, k, w, d, e).

fitkienerX: a vector (or a matrix) made of several parts:

ret : the return over the period calculated with sum(x). Thus, assume log-returns.
m, g, a, k, w, d, e : the parameter estimates.
m1, sd, sk, ke : the mean, standard deviation, skewness and excess of kurtosis computed from the parameter estimates.
m1x, sdx, skx, kex : The mean, standard deviation, skewness and excess of kurtosis computed from the dataset.
lh : the length of the dataset over the period.
q. : quantile estimated with the parameter estimates.
VaR. : Value-at-Risk, positive in most cases.
c. : corrective tail coefficient = (q - m) / (q_logistic_function - m).
ltm. : left tail mean (signed ES on the left tail, usually negative).
rtm. : right tail mean (signed ES on the right tail, usually positive).
dtmq. : (p<=0.5 left, p>0.5 right) tail mean minus quantile.
ES. : expected shortfall, positive in most cases.
h. : corrective ES = (ES - m) / (ES_logistic_function - m).
desv. : ES - VaR, usually positive.
l. : quantile estimated by the tangent logistic function.
dl. : quantile - quantile_logistic_function.
g. : quantile estimated by the Laplace-Gauss function.
dg. : quantile - quantile_Laplace_Gauss_function.

IMPORTANT : if you need to subset fitk, always subset it by parameter names and never subset it by rank number as new items may be added in the future and rank may vary. Use for instance exfit0, ..., exfit7.

References

P. Kiener, Fat tail analysis and package FatTailsR, 9th R/Rmetrics Workshop and Summer School, Zurich, 27 June 2015. https://www.inmodelia.com/exemples/2015-0627-Rmetrics-Kiener-en.pdf

Examples

    

require(minpack.lm)
require(timeSeries)

### Load the datasets and choose j in 1:16
DS     <- getDSdata()
j      <- 5

### and run this block
probak <- c(0.01, 0.05, 0.95, 0.99)
X      <- DS[[j]] ; names(DS)[j]
elevenprobs(X)
fitkienerX(X, algo = "reg", dgts = 3, probak = probak)
fitkienerX(X, algo = "estim", ord = 5, probak = probak, dgts = 3)
paramkienerX(X)
paramkienerX5(X)

### Compare the 12 values of paramkienerX(ord/row = 1:12) and paramkienerX (row 13)
compare <- function(ord, X) { paramkienerX(X, ord, algo = "estim", dgts = 13) }
rbind(t(sapply( 1:12, compare, X)), paramkienerX(X, algo = "reg", dgts = 13))

### Analyze DS in one step
t(sapply(DS, paramkienerX, algo = "reg", dgts = 13))
t(sapply(DS, paramkienerX, algo = "estim", dgts = 13))
paramkienerX(DS, algo = "reg", dgts = 13)
paramkienerX(DS, algo = "estim", dgts = 13)
system.time(fitk_rDS <- fitkienerX(DS, algo = "r", probak = pprobs2, dgts = 3))
system.time(fitk_eDS <- fitkienerX(DS, algo = "e", probak = pprobs2, dgts = 3))
fitk_rDS
fitk_eDS

### Subset rDS and eDS with exfit0,..,exfit7
fitk_rDS[,exfit4]
fitk_eDS[,exfit7]
fitkienerX(DS, algo = "e", probak = pprobs2, dgts = 3, exfitk = exfit7)

### Array (new example introduced in v1.5-1)
### Increase the number of cores and crash R.
## Not run:
arr <- array(rkiener1(3000), c(4,3,250))
paramkienerX7(arr, ncores = 2)
## paramkienerX7(arr, ncores = 2) - paramkienerX(arr, ncores = 2)
## End(Not run)

### End

[Package FatTailsR version 1.8-5 Index]