summ_hdr {pdqr} | R Documentation |
Summarize distribution with Highest Density Region
Description
summ_hdr()
computes a Highest Density Region (HDR) of some pdqr-function
for a supplied level
: a union of (closed) intervals total probability of
which is not less than level
and probability/density at any point inside it
is bigger than some threshold (which should be maximum one with a property
of HDR having total probability not less than level
). This also represents
a set of intervals with the lowest total width among all sets with total
probability not less than a level
.
Usage
summ_hdr(f, level = 0.95)
Arguments
f |
A pdqr-function representing distribution. |
level |
A desired lower bound for a total probability of an output set of intervals. |
Details
General algorithm of summ_hdr()
consists from two steps:
-
Find "target height". That is a value of probability/density which divides all support into two sets: the one with probability/density not less than target height (it is a desired HDR) and the other - with strictly less. The first set should also have total probability not less than
level
. -
Form a HDR as a set of closed intervals.
If f
has "discrete" type, target height is computed by looking at "x"
values of "x_tbl" metadata in order of decreasing probability
until their total probability is not less than level
. After that, all "x"
values with probability not less than height are considered to form a HDR.
Output is formed as a set of closed intervals (i.e. both edges included)
inside of which lie all HDR "x" elements and others - don't.
If f
has "continuous" type, target height is estimated as 1-level
quantile of Y = d_f(X)
distribution, where d_f
is d-function
corresponding to f
(as_d(f)
in other words) and X
is a random
variable represented by f
. Essentially, Y
has a distribution of f
's
density values and its 1-level
quantile is a target height. After that, HDR
is formed as a set of intervals with positive width (if level
is more
than 0, see Notes) inside which density is not less than target height.
Notes:
If
level
is 0, output has one interval of zero width at point of global mode.If
level
is 1, output has one interval equal to support.Computation of target height in case of "continuous" type is approximate which in some extreme cases (for example, like winsorized distributions) can lead to HDR having total probability very approximate to and even slightly lower than
level
.If d-function has "plateaus" (consecutive values with equal probability/density) at computed target height, total probability of HDR can be considerably bigger than
level
(see examples). However, this aligns with HDR definition, as density values should be not less than target height and total probability should be not less thanlevel
.
Value
A data frame with one row representing one closed interval of HDR and the following columns:
-
left
<dbl>
: Left end of intervals. -
right
<dbl>
: Right end of intervals.
See Also
region_*()
family of functions for working with output
HDR.
summ_interval()
for computing of single interval summary of distribution.
Other summary functions:
summ_center()
,
summ_classmetric()
,
summ_distance()
,
summ_entropy()
,
summ_interval()
,
summ_moment()
,
summ_order()
,
summ_prob_true()
,
summ_pval()
,
summ_quantile()
,
summ_roc()
,
summ_separation()
,
summ_spread()
Examples
# "discrete" functions
d_dis <- new_d(data.frame(x = 1:4, prob = c(0.4, 0.2, 0.3, 0.1)), "discrete")
summ_hdr(d_dis, 0.3)
summ_hdr(d_dis, 0.5)
summ_hdr(d_dis, 0.9)
## Zero width interval at global mode
summ_hdr(d_dis, 0)
# "continuous" functions
d_norm <- as_d(dnorm)
summ_hdr(d_norm, 0.95)
## Zero width interval at global mode
summ_hdr(d_norm, 0)
# Works well with mixture distributions
d_mix <- form_mix(list(as_d(dnorm), as_d(dnorm, mean = 5)))
summ_hdr(d_mix, 0.95)
# Plateaus
d_unif <- as_d(dunif)
## Returns all support because of density "plateau"
summ_hdr(d_unif, 0.1)
# Draw HDR
plot(d_mix)
region_draw(summ_hdr(d_mix, 0.95))