R: construction and handling of loss distributions

lossDistribution {HyRiM}

R Documentation

construction and handling of loss distributions

Description

Loss distributions can be constructed from both, continuous and categorical data. In any case, the input data must be a list (vector) of at least two numeric values all being \geq 1. For discrete data, the function additionally takes the full range of categories, all being represented as integers (with the lowest category having the number 1).

Usage

# construct a loss distribution from data
lossDistribution(
  dat,
  discrete = FALSE,
  dataType = c("raw", "pdf", "cdf"),
  supp = NULL,
  smoothing = c("none", "ongaps", "always"),
  bw = NULL)
# get information about the loss distribution
## S3 method for class 'mosg.lossdistribution'
print(x, ...)
## S3 method for class 'mosg.lossdistribution'
summary(object, ...)
## S3 method for class 'summary.mosg.lossdistribution'
print(x, ...)
## S3 method for class 'mosg.lossdistribution'
plot(x, points = 100, xlab = "", ylab = "",
           main = "", p = 0.999, newPlot = TRUE, cutoff = NULL, ...)
# get quantiative information about the distribution
## S3 method for class 'mosg.lossdistribution'
quantile(x, p, eps = 0.001, ...)
## S3 method for class 'mosg.lossdistribution'
mean(x, ...)
# evaluate the loss density function
## S3 method for class 'mosg.lossdistribution'
density(x, t, ...)
# for the cumulative distribution function, see the function 'cdf'

Arguments

`dat`	a vector of at least two input observations (all `\geq 1` required)
`discrete`	defaults to `FALSE`. If set to `TRUE`, the loss distribution is constructed as discrete. In that case, a value for `supp` is required.
`dataType`	applies only if `discrete=TRUE`, and specifies how the values in `dat` are to be interpreted. Defaults to `raw`, by which the data is taken as observations. Given as `pdf`, the values in `dat` are directly interpreted as a probability density (checked for nonnegativity and re-normalized if necessary). If the data type is specified as `cdf`, then the values in `dat` are taken as cumulative distribution function, i.e., checked to be non-decreasing, non-negative and re-normalized to 1 if necessary.
`supp`	if the parameter `discrete` is set to `TRUE`, then this parameter must be set as a vector of two elements, specifying the minimal and maximal category, e.g. `supp=c(1,5)`.
`bw`	the bandwidth parameter (numeric value) for kernel smoothing. Defaults internally to the result of bw.nrd0 if omitted.
`x`	a loss distribution object returned by `lossDistribution` or `mgss`, or a value within the support of a loss distribution.
`t`	a value within the support of `ld` or a summary object for a loss distribution.
`object`	a loss distribution object
`eps`	the accuracy at which the quantile is approximated (see the details below).
`smoothing`	string; partially matched with "none" (default), "ongaps", and "always". If set to "always", then the function computes a discrete kernel density estimate (using a discretized version of a Gaussian density with a bandwidth as computed by `bw.nrd0` (Silverman's rule)), to assign categories with zero probability a positive likelihood. If set to "ongaps", then the smoothing is applied only if necessary (i.e., if the probability mass is zero on at least one category).

the function plot.mosg.lossdistribution takes the parameters:

`points`	the number of points at which loss densities are is evaluated (numerically) for plotting.
`xlab`	a label for the x-axis in the plot.
`ylab`	a label for the y-axis in the plot.
`main`	a title for the plot
`p`	a quantile that determines the plot range for the loss distribution
`newPlot`	if set to `TRUE`, then a new plot is opened. Otherwise, the plot is added to the current plot window (typcially used by `plot.mosg` to visualize game matrices).
`cutoff`	the cutoff point at which all densities shall be truncated before plotting (note that the mass functions are rescaled towards unit mass).
`...`	further arguments passed to or from other methods

Details

The function internally computes a Gaussian kernel density estimator (KDE; using Silverman's rule of thumb for the bandwidth selection) on the continuous data. The distribution is truncated at the maximal observation supplied + 5*the bandwidth of the Gaussian KDE, or equivalently, at the right end of the support in case of discrete distributions.

For discrete distributions, missing observations are handled by smoothing the density (by convolution with a discretized Gaussian kernel). As an alternative, a re-definition of categories may be considered.

Degenerate distributions are not supported! The construction of classical games with real-valued payoffs works directly through mosg by supplying a list of values rather than loss distributions. See the example given with mosg.

The generic functions quantile, mean and density both distinguish discrete from continuous distributions in the way of how values are being computed.

Quantiles are computed using the direct definition as an approximation y so that x = Pr(ld <= y). For continuous distributions, a bisective search is performed to approximate the inverse cumulative distribution function. For discret distributions, quantile works with cumulative sums. The accuracy parameter eps passed to quantile causes the bisective search to stop if the search interval has a length less than eps. In that case, the middle of the interval is returned. For discrete distributions, the computation is done by cumulative sums on the discrete probability mass function.

mean either invokes moment(ld, 1) to compute the first moment.

density is either a wrapper for the internal representation by the function object lossdistr, or directly accesses the probability mass function as internally stored in the field dpdf (see the 'values' section below).

For visualization, plot produces a bar plot for categorical distributions (over categories as specified by the supp field; see the 'values' section below), and for continous distributions, a continuous line plot is returned on the range 1...max(range + 5*bw), where the values are described below. To ease comparison and a visual inspection of the game matrix, the default plot ranges can be overridden by supplying xlim and ylim for the plot function.

Value

The return values of lossDistribution is an object of class mosg.lossdistribution. The same goes for lossDistribution.mosg.

`observations`	carries over the data vector supplied to construct the distribution.
`range`	the minimal and maximal loss observed, as a 2-element vector. For loss distributions induced by games, the range is the smallest interval covering the ranges of all distributions in the game.
`bw`	the bandwidth used for the kernel density approximate.
`lossdistr`	a `function` embodying the kernel density (probability mass function) as a spline function (for continuous densities only)
`normalizationFactor`	the factor by which `lossdistr` must be multiplied (to normalize under the truncation at max(observations) + 5*bw.
`is.mixedDistribution`	a flag indicating whether or not the distribution was constructed by a call to `lossDistribution` or the generic function `lossDistribution.mosg`.
`is.discrete`	a flag set to `TRUE` if the distribution is over categories
`dpdf`	if `is.discrete` is `TRUE`, then this is a vector of probability masses over the support (field `supp`).
`supp`	if `is.discrete` is `TRUE`, then this is a 2-element vector specifying the minimal and maximal loss category (represented by integers).

A summary returns an object of class mosg.equilibrium.summary, for which the generic print function can be applied, and which carries the following fields:

`range`	the minimal and maximal observation of the underlying data (if available), or the minimal and maximal losses anticipated for this distribution (e.g., in case of discrete distributions the common support).
`mean`	the first moment as computed by `mean`.
`variance`	the variance as computed by `variance`.
`quantiles`	a 2x5-matrix of quantiles at levels 10%, 25%, 50%, 75% and 90%.

Note

If the plotting throws an error concerning too large figure margins, then adjusting the plot parameters using par may help, since the plot function does not override any of the current plot settings (e.g., issue par(c(0,0,1,1) + 0.1)) before plotting to reduce the spacing close towards zero))

In some cases, plots may require careful customization to look well, so playing arourd with the other settings as offered by par can be useful.

If the distribution has been smoothed, then mean, variance, quantile, density and cdf will refer to the smoothed version of the distribution. In that case, the returned quantities are mere approximations of the analogous values obtained directly from the underlying data.

Author(s)

Stefan Rass

Examples

# construct a loss distribution from observations (raw data)
cvss1base <- c(10,6.4,9,7.9,7.1,9)
ld <- lossDistribution(cvss1base)
summary(ld)
plot(ld)

# construct a loss distribution of given shape
# for example, a Poisson density with lambda = 4
x <- 1:10
f <- dpois(x, lambda = 4)
# construct the loss distribution by declaring the data
# to be a probability density function (pdf)
ld <- lossDistribution(f, dataType = "pdf", discrete = TRUE, supp = range(x))
# note that this call throws a warning since it internally
# truncates the loss distribution to the support 1:10, and
# renormalizes the supplied density for that matter.

# for further examples, see the documentation to 'mosg' and 'mosg.equilibrium'

[Package HyRiM version 2.0.2 Index]