Essential Histogram {essHist}R Documentation

The Essential Histogram

Description

Compute the essential histogram via (pruned) dynamic programming.

Usage

essHistogram(x, alpha = 0.5, q = NULL, intv = NULL, plot = TRUE, 
              mode = ifelse(anyDuplicated(x),"Gen","Con"), 
              xname = deparse(substitute(x)), ...)

Arguments

x

a numeric vector containing the data.

alpha

significance level; default as 0.5. One should set alpha = 0.1 or even smaller if confidence statements have to be made, while one can set alpha = 0.9 if the goal is to explore the data for potential features with tolerance to false positives. The default value is only a trade-off.

q

threshold value; by default, q is chosen as the (1-alpha)-quantile of the null distribution of the multiscale statistic via Monte Carlo simulation, see also msQuantile.

intv

a data frame provides the system of intervals on which the multiscale statistic is defined. The data frame constains the following two columns

left left index of an interval

right right index of an interval

By default, it is set to the sparse interval system proposed by Rivera and Walther (2013), see also Li et al. (2016).

plot

logical. If TRUE (default), a histogram is plotted, otherwise a list of breaks and counts is returned. In the latter case, a warning is used if (typically graphical) arguments are specified that only apply to the plot = TRUE case.

mode

"Con" for continuous distribution functions

"Gen" for general (possibly with discontinuous) distribution functions

By default, "Con" is chosen if there is no tied observations; otherwise, "Gen" is chosen; see Li et al. (2016) for further details.

xname

a character string with the actual x argument name.

...

further arguments and graphical parameters passed to plot.histogram and thence to title and axis (if plot = TRUE).

Details

The essential histogram is defined as the histogram with least blocks within the multiscale constraint. The one with highest likelihood is picked if there are more than one solutions. The essential histogram involves only one parameter q, the threshold of the multiscale constraint. Such a parameter can be chosen by means of the significance level alpha, which leads to nature statistical significance statements for the multiscale constraint. The computational complexity is often linear in terms of sample size, although the worst complexity bound is quadratic up to a log-factor in case of the sparse interval system. See Li et al. (2016) for further details.

Value

An object of class "histogram", which is of the same class as returned by function hist.

Note

The argument intv is internally adjusted to ensure it contains no empty intervals, especially in case of tied observations. The first block of the returned histogram is a closed interval, and the rest blocks are left open right closed intervals. All the printing messages can be disabled by calling suppressMessages.

References

Li, H., Munk, A., Sieling, H., and Walther, G. (2016). The essential histogram. arXiv:1612.07216.

Rivera, C., & Walther, G. (2013). Optimal detection of a jump in the intensity of a Poisson process or in a density with likelihood ratio statistics. Scand. J. Stat. 40, 752–769.

See Also

checkHistogram, genIntv, hist, msQuantile

Examples

# Simulate data
set.seed(123)
type = 'skewed_unimodal'
n = 500
y = rmixnorm(n, type = type)

# Compute the essential histogram
eh = essHistogram(y, plot = FALSE)

# Plot results
#     compute oracle density
x  = sort(y)
od = dmixnorm(x, type = type)
#     compare with orcle density
plot(x, od, type = "l", xlab = NA, ylab = NA, col = "red", main = type)
lines(eh)
legend("topleft", c("Oracle density", "Essential histogram"), 
       lty = c(1,1), col = c("red", "black"))

[Package essHist version 1.2.2 Index]