optbins-methods {rebmix} | R Documentation |
Optimal Numbers of Bins Calculation
Description
Returns the matrix of size n_{\mathrm{D}} \times d
containing optimal numbers of bins v_{1}, \ldots, v_{d}
for all processed datasets.
Usage
## S4 method for signature 'list'
optbins(Dataset = list(), Rule = "Knuth equal",
ymin = numeric(), ymax = numeric(), kmin = numeric(),
kmax = numeric(), ...)
## ... and for other signatures
Arguments
Dataset |
a list of length |
Rule |
a character giving the histogram binning rule. One of |
ymin |
a vector of length |
ymax |
a vector of length |
kmin |
lower limit of the number of bins. The default value is |
kmax |
upper limit of the number of bins. The default value is |
... |
currently not used. |
Methods
signature(x = "list")
a list of data frames.
Author(s)
Branislav Panic, Marko Nagode
References
K. K. Knuth. Optimal data-based binning for histograms and histogram-based probability density models.
Digital Signal Processing, 95:102581, 2019.
doi:10.1016/j.dsp.2019.102581.
B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation.
Mathematics, 8(3):373, 2020.
doi:10.3390/math8030373.
Examples
# Generate multivariate normal datasets.
n <- c(750, 1000)
Theta <- new("RNGMVNORM.Theta", c = 2, d = 2)
a.theta1(Theta, 1) <- c(8, 6)
a.theta1(Theta, 2) <- c(6, 8)
a.theta2(Theta, 1) <- c(8, 2, 2, 4)
a.theta2(Theta, 2) <- c(2, 1, 1, 4)
sim2d <- RNGMIX(model = "RNGMVNORM",
Dataset.name = paste("sim2d_", 1:5, sep = ""),
rseed = -1,
n = n,
Theta = a.Theta(Theta))
# Calculate optimal numbers of bins.
opt.k <- optbins(Dataset = sim2d@Dataset,
Rule = "Knuth equal",
ymin = sim2d@ymin,
ymax = sim2d@ymax,
kmin = 2,
kmax = 20)
opt.k
# Create object of class EM.Control.
EM <- new("EM.Control", strategy = "exhaustive", variant = "EM",
acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4,
maximum.iterations = 1000)
# Estimate number of components, component weights and component parameters.
sim2dest <- REBMIX(model = "REBMVNORM",
Dataset = a.Dataset(sim2d),
Preprocessing = "h",
cmax = 10,
ymin = a.ymin(sim2d),
ymax = a.ymax(sim2d),
K = opt.k,
Criterion = "BIC",
EMcontrol = EM)
# Plot finite mixture.
plot(sim2dest, pos = 3, nrow = 4, what = c("pdf", "marginal pdf", "IC"))
# Estimate number of components, component weights and component
# parameters for well known Iris dataset.
Dataset <- list(iris[, c(1:4)])
# Calculate optimal numbers of bins using non-equal number of bins in each dimension.
opt.k <- optbins(Dataset = Dataset,
Rule = "Knuth unequal",
kmin = 2,
kmax = 20)
opt.k
# Estimate number of components, component weights and component parameters.
irisest <- REBMIX(model = "REBMVNORM",
Dataset = Dataset,
Preprocessing = "h",
cmax = 10,
K = opt.k,
Criterion = "BIC",
EMcontrol = EM)
irisest