| REBMIX-class {rebmix} | R Documentation |
Class "REBMIX"
Description
Object of class REBMIX.
Objects from the Class
Objects can be created by calls of the form new("REBMIX", ...). Accessor methods for the slots are a.Dataset(x = NULL, pos = 0),
a.Preprocessing(x = NULL), a.cmax(x = NULL), a.cmin(x = NULL), a.Criterion(x = NULL), a.Variables(x = NULL),
a.pdf(x = NULL), a.theta1(x = NULL), a.theta2(x = NULL), a.theta3(x = NULL), a.K(x = NULL), a.ymin(x = NULL),
a.ymax(x = NULL), a.ar(x = NULL), a.Restraints(x = NULL), a.Mode(x = NULL), a.w(x = NULL, pos = 0), a.Theta(x = NULL, pos = 0), a.summary(x = NULL, col.name = character(), pos = 0),
a.summary.EM(x = NULL, col.name = character(), pos = 0), a.pos(x = NULL),
a.opt.c(x = NULL), a.opt.IC(x = NULL), a.opt.logL(x = NULL), a.opt.Dmin(x = NULL), a.opt.D(x = NULL), a.all.K(x = NULL), a.all.IC(x = NULL),
a.theta1.all(x = NULL, pos = 1), a.theta2.all(x = NULL, pos = 1) and a.theta3.all(x = NULL, pos = 1), where x, pos and col.name stand for an object of class REBMIX,
a desired slot item and a desired column name, respectively.
Slots
Dataset:-
a list of length
n_{\mathrm{D}}of data frames or objects of classHistogram. Data frames should have sizen \times dcontaining d-dimensional datasets. Each of thedcolumns represents one random variable. Numbers of observationsnequal the number of rows in the datasets. Preprocessing:-
a character vector giving the preprocessing types. One of
"histogram",
"kernel density estimation"or"k-nearest neighbour". cmax:-
maximum number of components
c_{\mathrm{max}} > 0. The default value is15. cmin:-
minimum number of components
c_{\mathrm{min}} > 0. The default value is1. Criterion:-
a character giving the information criterion type. One of default Akaike
"AIC","AIC3","AIC4"or"AICc", Bayesian"BIC", consistent Akaike"CAIC", Hannan-Quinn"HQC", minimum description length"MDL2"or"MDL5", approximate weight of evidence"AWE", classification likelihood"CLC", integrated classification likelihood"ICL"or"ICL-BIC", partition coefficient"PC", total of positive relative deviations"D"or sum of squares error"SSE". Variables:-
a character vector of length
dcontaining types of variables. One of"continuous"or"discrete". pdf:-
a character vector of length
dcontaining continuous or discrete parametric family types. One of"normal","lognormal","Weibull","gamma","Gumbel","binomial","Poisson","Dirac","uniform"or"vonMises". theta1:-
a vector of length
dcontaining initial component parameters. One ofn_{il} = \textrm{number of categories} - 1for"binomial"distribution. theta2:-
a vector of length
dcontaining initial component parameters. Currently not used. theta3:-
a vector of length
dcontaining initial component parameters. One of\xi_{il} \in \{-1, \textrm{NA}, 1\}for"Gumbel"distribution. K:-
a character or a vector or a list of vectors containing numbers of bins
vfor the histogram and the kernel density estimation or numbers of nearest neighbourskfor the k-nearest neighbour. There is no genuine rule to identifyvork. Consequently, the REBMIX algorithm identifies them from the setKof input values by minimizing the information criterion. The Sturges rulev = 1 + \mathrm{log_{2}}(n),\mathrm{Log}_{10}rulev = 10 \mathrm{log_{10}}(n)or RootN rulev = 2 \sqrt{n}can be applied to estimate the limiting numbers of bins or the rule of thumbk = \sqrt{n}to guess the intermediate number of nearest neighbours. If, e.g.,K = c(10, 20, 40, 60)and minimumICcoincides, e.g.,40, brackets are set to20and60and the golden section is applied to refine the minimum search. See alsokseqfor sequence of bins or nearest neighbours generation. The default value is"auto". ymin:-
a vector of length
dcontaining minimum observations. The default value isnumeric(). ymax:-
a vector of length
dcontaining maximum observations. The default value isnumeric(). ar:-
acceleration rate
0 < a_{\mathrm{r}} \leq 1. The default value is0.1and in most cases does not have to be altered. Restraints:-
a character giving the restraints type. One of
"rigid"or default"loose". The rigid restraints are obsolete and applicable for well separated components only. Mode:-
a character giving the mode type. One of
"all","outliers"or default"outliersplus".The modes are determined in decreasing order of magnitude from all observations ifMode = "all". IfMode = "outliers", the modes are determined in decreasing order of magnitude from outliers only. In the meantime, some outliers are reclassified as inliers. Finally, when all observations are inliers, the procedure is completed. IfMode = "outliersplus", the modes are determined in decreasing magnitude from the outliers only. In the meantime, some outliers are reclassified as inliers. Finally, if all observations are inliers, they are converted to outliers and the mode determination procedure is continued. w:-
a list of vectors of length
ccontaining component weightsw_{l}summing to 1. Theta:-
a list of lists each containing
cparametric family typespdfl. One of"normal","lognormal","Weibull","gamma","Gumbel","binomial","Poisson","Dirac","uniform"or circular"vonMises"defined for0 \leq y_{i} \leq 2 \pi. Component parameterstheta1.lfollow the parametric family types. One of\mu_{il}for normal, lognormal, Gumbel and von Mises distributions,\theta_{il}for Weibull, gamma, binomial, Poisson and Dirac distributions andafor uniform distribution. Component parameterstheta2.lfollowtheta1.l. One of\sigma_{il}for normal, lognormal and Gumbel distributions,\beta_{il}for Weibull and gamma distributions,p_{il}for binomial distribution,\kappa_{il}for von Mises distribution andbfor uniform distribution. Component parameterstheta3.lfollowtheta2.l. One of\xi_{il}for Gumbel distribution. summary:-
a data frame with additional information about dataset, preprocessing,
c_{\mathrm{max}},c_{\mathrm{min}}, information criterion type,a_{\mathrm{r}}, restraints type, mode type, optimalc, optimalvork,K,y_{i0},y_{i\mathrm{min}},y_{i\mathrm{max}}, optimalh_{i}, information criterion\mathrm{IC}, log likelihood\mathrm{log}\, Land degrees of freedomM. summary.EM:-
a data frame with additional information about dataset, strategy for the EM algorithm
strategy, variant of the EM algorithmvariant, acceleration typeacceleration, tolerancetolerance, acceleration multilplieracceleration.multiplier, maximum allowed number of iterationsmaximum.iterations, number of iterations used for obtaining optimal solutionopt.iterations.nbrand total number of iterations of the EM algorithmtotal.iterations.nbr. pos:-
position in the
summarydata frame at which log likelihood\mathrm{log}\, Lattains its maximum. opt.c:-
a list of vectors containing numbers of components for optimal
vfor the histogram and the kernel density estimation or for optimal number of nearest neighbourskfor the k-nearest neighbour. opt.IC:-
a list of vectors containing information criteria for optimal
vfor the histogram and the kernel density estimation or for optimal number of nearest neighbourskfor the k-nearest neighbour. opt.logL:-
a list of vectors containing log likelihoods for optimal
vfor the histogram and the kernel density estimation or for optimal number of nearest neighbourskfor the k-nearest neighbour. opt.Dmin:-
a list of vectors containing
D_{\mathrm{min}}values for optimalvfor the histogram and the kernel density estimation or for optimal number of nearest neighbourskfor the k-nearest neighbour. opt.D:-
a list of vectors containing totals of positive relative deviations for optimal
vfor the histogram and the kernel density estimation or for optimal number of nearest neighbourskfor the k-nearest neighbour. all.K:-
a list of vectors containing all processed numbers of bins
vfor the histogram and the kernel density estimation or all processed numbers of nearest neighbourskfor the k-nearest neighbour. all.IC:-
a list of vectors containing information criteria for all processed numbers of bins
vfor the histogram and the kernel density estimation or for all processed numbers of nearest neighbourskfor the k-nearest neighbour.
Author(s)
Marko Nagode