REBMIX-class {rebmix}R Documentation

Class "REBMIX"

Description

Object of class REBMIX.

Objects from the Class

Objects can be created by calls of the form new("REBMIX", ...). Accessor methods for the slots are a.Dataset(x = NULL, pos = 0), a.Preprocessing(x = NULL), a.cmax(x = NULL), a.cmin(x = NULL), a.Criterion(x = NULL), a.Variables(x = NULL), a.pdf(x = NULL), a.theta1(x = NULL), a.theta2(x = NULL), a.theta3(x = NULL), a.K(x = NULL), a.ymin(x = NULL), a.ymax(x = NULL), a.ar(x = NULL), a.Restraints(x = NULL), a.Mode(x = NULL), a.w(x = NULL, pos = 0), a.Theta(x = NULL, pos = 0), a.summary(x = NULL, col.name = character(), pos = 0), a.summary.EM(x = NULL, col.name = character(), pos = 0), a.pos(x = NULL), a.opt.c(x = NULL), a.opt.IC(x = NULL), a.opt.logL(x = NULL), a.opt.Dmin(x = NULL), a.opt.D(x = NULL), a.all.K(x = NULL), a.all.IC(x = NULL), a.theta1.all(x = NULL, pos = 1), a.theta2.all(x = NULL, pos = 1) and a.theta3.all(x = NULL, pos = 1), where x, pos and col.name stand for an object of class REBMIX, a desired slot item and a desired column name, respectively.

Slots

Dataset:

a list of length nDn_{\mathrm{D}} of data frames or objects of class Histogram. Data frames should have size n×dn \times d containing d-dimensional datasets. Each of the dd columns represents one random variable. Numbers of observations nn equal the number of rows in the datasets.

Preprocessing:

a character vector giving the preprocessing types. One of "histogram",
"kernel density estimation" or "k-nearest neighbour".

cmax:

maximum number of components cmax>0c_{\mathrm{max}} > 0. The default value is 15.

cmin:

minimum number of components cmin>0c_{\mathrm{min}} > 0. The default value is 1.

Criterion:

a character giving the information criterion type. One of default Akaike "AIC", "AIC3", "AIC4" or "AICc", Bayesian "BIC", consistent Akaike "CAIC", Hannan-Quinn "HQC", minimum description length "MDL2" or "MDL5", approximate weight of evidence "AWE", classification likelihood "CLC", integrated classification likelihood "ICL" or "ICL-BIC", partition coefficient "PC", total of positive relative deviations "D" or sum of squares error "SSE".

Variables:

a character vector of length dd containing types of variables. One of "continuous" or "discrete".

pdf:

a character vector of length dd containing continuous or discrete parametric family types. One of "normal", "lognormal", "Weibull", "gamma", "Gumbel", "binomial", "Poisson", "Dirac", "uniform" or "vonMises".

theta1:

a vector of length dd containing initial component parameters. One of nil=number of categories1n_{il} = \textrm{number of categories} - 1 for "binomial" distribution.

theta2:

a vector of length dd containing initial component parameters. Currently not used.

theta3:

a vector of length dd containing initial component parameters. One of ξil{1,NA,1}\xi_{il} \in \{-1, \textrm{NA}, 1\} for "Gumbel" distribution.

K:

a character or a vector or a list of vectors containing numbers of bins vv for the histogram and the kernel density estimation or numbers of nearest neighbours kk for the k-nearest neighbour. There is no genuine rule to identify vv or kk. Consequently, the REBMIX algorithm identifies them from the set K of input values by minimizing the information criterion. The Sturges rule v=1+log2(n)v = 1 + \mathrm{log_{2}}(n), Log10\mathrm{Log}_{10} rule v=10log10(n)v = 10 \mathrm{log_{10}}(n) or RootN rule v=2nv = 2 \sqrt{n} can be applied to estimate the limiting numbers of bins or the rule of thumb k=nk = \sqrt{n} to guess the intermediate number of nearest neighbours. If, e.g., K = c(10, 20, 40, 60) and minimum IC coincides, e.g., 40, brackets are set to 20 and 60 and the golden section is applied to refine the minimum search. See also kseq for sequence of bins or nearest neighbours generation. The default value is "auto".

ymin:

a vector of length dd containing minimum observations. The default value is numeric().

ymax:

a vector of length dd containing maximum observations. The default value is numeric().

ar:

acceleration rate 0<ar10 < a_{\mathrm{r}} \leq 1. The default value is 0.1 and in most cases does not have to be altered.

Restraints:

a character giving the restraints type. One of "rigid" or default "loose". The rigid restraints are obsolete and applicable for well separated components only.

Mode:

a character giving the mode type. One of "all", "outliers" or default "outliersplus".The modes are determined in decreasing order of magnitude from all observations if Mode = "all". If Mode = "outliers", the modes are determined in decreasing order of magnitude from outliers only. In the meantime, some outliers are reclassified as inliers. Finally, when all observations are inliers, the procedure is completed. If Mode = "outliersplus", the modes are determined in decreasing magnitude from the outliers only. In the meantime, some outliers are reclassified as inliers. Finally, if all observations are inliers, they are converted to outliers and the mode determination procedure is continued.

w:

a list of vectors of length cc containing component weights wlw_{l} summing to 1.

Theta:

a list of lists each containing cc parametric family types pdfl. One of "normal", "lognormal", "Weibull", "gamma", "Gumbel", "binomial", "Poisson", "Dirac", "uniform" or circular "vonMises" defined for 0yi2π0 \leq y_{i} \leq 2 \pi. Component parameters theta1.l follow the parametric family types. One of μil\mu_{il} for normal, lognormal, Gumbel and von Mises distributions, θil\theta_{il} for Weibull, gamma, binomial, Poisson and Dirac distributions and aa for uniform distribution. Component parameters theta2.l follow theta1.l. One of σil\sigma_{il} for normal, lognormal and Gumbel distributions, βil\beta_{il} for Weibull and gamma distributions, pilp_{il} for binomial distribution, κil\kappa_{il} for von Mises distribution and bb for uniform distribution. Component parameters theta3.l follow theta2.l. One of ξil\xi_{il} for Gumbel distribution.

summary:

a data frame with additional information about dataset, preprocessing, cmaxc_{\mathrm{max}}, cminc_{\mathrm{min}}, information criterion type, ara_{\mathrm{r}}, restraints type, mode type, optimal cc, optimal vv or kk, KK, yi0y_{i0}, yiminy_{i\mathrm{min}}, yimaxy_{i\mathrm{max}}, optimal hih_{i}, information criterion IC\mathrm{IC}, log likelihood logL\mathrm{log}\, L and degrees of freedom MM.

summary.EM:

a data frame with additional information about dataset, strategy for the EM algorithm strategy, variant of the EM algorithm variant, acceleration type acceleration, tolerance tolerance, acceleration multilplier acceleration.multiplier, maximum allowed number of iterations maximum.iterations, number of iterations used for obtaining optimal solution opt.iterations.nbr and total number of iterations of the EM algorithm total.iterations.nbr.

pos:

position in the summary data frame at which log likelihood logL\mathrm{log}\, L attains its maximum.

opt.c:

a list of vectors containing numbers of components for optimal vv for the histogram and the kernel density estimation or for optimal number of nearest neighbours kk for the k-nearest neighbour.

opt.IC:

a list of vectors containing information criteria for optimal vv for the histogram and the kernel density estimation or for optimal number of nearest neighbours kk for the k-nearest neighbour.

opt.logL:

a list of vectors containing log likelihoods for optimal vv for the histogram and the kernel density estimation or for optimal number of nearest neighbours kk for the k-nearest neighbour.

opt.Dmin:

a list of vectors containing DminD_{\mathrm{min}} values for optimal vv for the histogram and the kernel density estimation or for optimal number of nearest neighbours kk for the k-nearest neighbour.

opt.D:

a list of vectors containing totals of positive relative deviations for optimal vv for the histogram and the kernel density estimation or for optimal number of nearest neighbours kk for the k-nearest neighbour.

all.K:

a list of vectors containing all processed numbers of bins vv for the histogram and the kernel density estimation or all processed numbers of nearest neighbours kk for the k-nearest neighbour.

all.IC:

a list of vectors containing information criteria for all processed numbers of bins vv for the histogram and the kernel density estimation or for all processed numbers of nearest neighbours kk for the k-nearest neighbour.

Author(s)

Marko Nagode


[Package rebmix version 2.16.0 Index]