ahist {Ckmeans.1d.dp} | R Documentation |
Adaptive Histograms
Description
Generate or plot histograms adaptive to patterns in univariate data. The number and widths of histogram bins are automatically calculated based on an optimal univariate clustering of input data. Thus the bins are unlikely of equal width.
Usage
ahist(x, k = c(1,9), breaks=NULL, data=NULL, weight=1,
plot = TRUE, xlab = deparse(substitute(x)),
wlab = deparse(substitute(weight)),
main = NULL, col = "lavender", border = graphics::par("fg"),
lwd = graphics::par("lwd"),
col.stick = "gray", lwd.stick = 1, add.sticks=TRUE,
style = c("discontinuous", "midpoints"),
skip.empty.bin.color=TRUE,
...)
Arguments
x |
a numeric vector of data or an object of class If If |
k |
either an exact integer number of bins/clusters, or a vector of length two specifying the minimum and maximum numbers of bins/clusters to be examined. The default is |
breaks |
This argument is defined in |
data |
a numeric vector. If |
weight |
a value of 1 to specify equal weights or a numeric vector of unequal weights for each element. The default weight is one. It is highly recommended to use positive (instead of zero) weights to account for the influence of every element. The weights have a strong impact on the clustering result. |
plot |
a logical. If |
xlab |
a character string. The x-axis label for the plot. |
wlab |
a character string. The weight-axis label for the plot. It is the vertical axis to the right of the plot. |
main |
a character string. The title for the plot. |
col |
a character string. The fill color of the histogram bars. |
border |
a character string. The color of the histogram bar borders. |
lwd |
a numeric value. The line width of the border of the histogram bars |
col.stick |
a character string. The color of the sticks above the x-axis. See Details. |
lwd.stick |
a numeric value. The line width of the sticks above the x-axis. See Details. |
add.sticks |
a logical. If |
style |
a character string. The style of the adaptive histogram. See details. |
skip.empty.bin.color |
a logical. If |
... |
additional arguments to be passed to |
Details
The histogram is by default plotted using the plot.histogram
method. The plot can be optionally disabled with the plot=FALSE
argument. The original input data are shown as sticks just above the horizontal axis.
If the breaks
argument is not specified, the number of histogram bins is the optimal number of clusters estimated using Bayesian information criterion evaluated on Gaussian mixture models fitted to the input data in x
.
If not provided with the breaks
argument, breaks in the histogram are derived from clusters identified by optimal univariate clustering (Ckmeans.1d.dp
) in two styles. With the default "discontinuous"
style, the bin width of each bar is determined according to a data-adaptive rule; the "midpoints"
style uses the midpoints of cluster border points to determine the bin-width. For clustered data, the "midpoints"
style generates bins that are too wide to capture the cluster patterns. In contrast, the "discontinuous"
style is more adaptive to the data by allowing some bins to be empty making the histogram bars discontinuous.
Value
An object of class histogram
defined in hist
. It has a S3 plot
method plot.histogram
.
Author(s)
Joe Song
See Also
Examples
# Example 1: plot an adaptive histogram from data generated by
# a Gaussian mixture model with three components
x <- c(rnorm(40, mean=-2, sd=0.3),
rnorm(45, mean=1, sd=0.1),
rnorm(70, mean=3, sd=0.2))
ahist(x, col="lightblue", sub=paste("n =", length(x)),
col.stick="salmon", lwd=2,
main=paste("Example 1. Gaussian mixture model with 3 components",
"(one bin per component)", sep="\n"))
# Example 2: plot an adaptive histogram from data generated by
# a Gaussian mixture model with three components using a given
# number of bins
ahist(x, k=9, col="lavender", col.stick="salmon",
sub=paste("n =", length(x)), lwd=2,
main=paste("Example 2. Gaussian mixture model with 3 components",
"(on average 3 bins per component)", sep="\n"))
# Example 3: The DNase data frame has 176 rows and 3 columns of
# data obtained during development of an ELISA assay for the
# recombinant protein DNase in rat serum.
data(DNase)
res <- Ckmeans.1d.dp(DNase$density)
kopt <- length(res$size)
ahist(res, data=DNase$density, col=rainbow(kopt),
col.stick=rainbow(kopt)[res$cluster],
sub=paste("n =", length(x)), border="transparent",
xlab="Optical density of protein DNase",
main="Example 3. Elisa assay of DNase in rat serum")
# Example 4: Add sticks to histograms with the R provided
# hist() function.
ahist(DNase$density, breaks="Sturges", col="palegreen",
add.sticks=TRUE, col.stick="darkgreen",
main=paste("Example 4. Elisa assay of DNase in rat serum",
"(Equal width bins)", sep="\n"),
xlab="Optical density of protein DNase")
# Example 5: Weighted adatpive histograms
x <- sort(c(rnorm(40, mean=-2, sd=0.3),
rnorm(45, mean=2, sd=0.1),
rnorm(70, mean=4, sd=0.2)))
y <- (1 + sin(0.10 * seq_along(x))) * (x-1)^2
ahist(x, weight=y, sub=paste("n =", length(x)),
col.stick="forestgreen", lwd.stick=0.25, lwd=2,
main="Example 5. Weighted adaptive histogram")
# Example 6: Cluster data with repetitive elements
ahist(c(1,1,1,1, 3,4,4, 6,6,6), k=c(2,4), col="gray",
lwd=2, lwd.stick=6, col.stick="chocolate",
main=paste("Example 6. Adaptive histogram of",
"repetitive elements", sep="\n"))