REBMIX-class {rebmix} | R Documentation |
Class "REBMIX"
Description
Object of class REBMIX
.
Objects from the Class
Objects can be created by calls of the form new("REBMIX", ...)
. Accessor methods for the slots are a.Dataset(x = NULL, pos = 0)
,
a.Preprocessing(x = NULL)
, a.cmax(x = NULL)
, a.cmin(x = NULL)
, a.Criterion(x = NULL)
, a.Variables(x = NULL)
,
a.pdf(x = NULL)
, a.theta1(x = NULL)
, a.theta2(x = NULL)
, a.theta3(x = NULL)
, a.K(x = NULL)
, a.ymin(x = NULL)
,
a.ymax(x = NULL)
, a.ar(x = NULL)
, a.Restraints(x = NULL)
, a.Mode(x = NULL)
, a.w(x = NULL, pos = 0)
, a.Theta(x = NULL, pos = 0)
, a.summary(x = NULL, col.name = character(), pos = 0)
,
a.summary.EM(x = NULL, col.name = character(), pos = 0)
, a.pos(x = NULL)
,
a.opt.c(x = NULL)
, a.opt.IC(x = NULL)
, a.opt.logL(x = NULL)
, a.opt.Dmin(x = NULL)
, a.opt.D(x = NULL)
, a.all.K(x = NULL)
, a.all.IC(x = NULL)
,
a.theta1.all(x = NULL, pos = 1)
, a.theta2.all(x = NULL, pos = 1)
and a.theta3.all(x = NULL, pos = 1)
, where x
, pos
and col.name
stand for an object of class REBMIX
,
a desired slot item and a desired column name, respectively.
Slots
Dataset
:-
a list of length
n_{\mathrm{D}}
of data frames or objects of classHistogram
. Data frames should have sizen \times d
containing d-dimensional datasets. Each of thed
columns represents one random variable. Numbers of observationsn
equal the number of rows in the datasets. Preprocessing
:-
a character vector giving the preprocessing types. One of
"histogram"
,
"kernel density estimation"
or"k-nearest neighbour"
. cmax
:-
maximum number of components
c_{\mathrm{max}} > 0
. The default value is15
. cmin
:-
minimum number of components
c_{\mathrm{min}} > 0
. The default value is1
. Criterion
:-
a character giving the information criterion type. One of default Akaike
"AIC"
,"AIC3"
,"AIC4"
or"AICc"
, Bayesian"BIC"
, consistent Akaike"CAIC"
, Hannan-Quinn"HQC"
, minimum description length"MDL2"
or"MDL5"
, approximate weight of evidence"AWE"
, classification likelihood"CLC"
, integrated classification likelihood"ICL"
or"ICL-BIC"
, partition coefficient"PC"
, total of positive relative deviations"D"
or sum of squares error"SSE"
. Variables
:-
a character vector of length
d
containing types of variables. One of"continuous"
or"discrete"
. pdf
:-
a character vector of length
d
containing continuous or discrete parametric family types. One of"normal"
,"lognormal"
,"Weibull"
,"gamma"
,"Gumbel"
,"binomial"
,"Poisson"
,"Dirac"
,"uniform"
or"vonMises"
. theta1
:-
a vector of length
d
containing initial component parameters. One ofn_{il} = \textrm{number of categories} - 1
for"binomial"
distribution. theta2
:-
a vector of length
d
containing initial component parameters. Currently not used. theta3
:-
a vector of length
d
containing initial component parameters. One of\xi_{il} \in \{-1, \textrm{NA}, 1\}
for"Gumbel"
distribution. K
:-
a character or a vector or a list of vectors containing numbers of bins
v
for the histogram and the kernel density estimation or numbers of nearest neighboursk
for the k-nearest neighbour. There is no genuine rule to identifyv
ork
. Consequently, the REBMIX algorithm identifies them from the setK
of input values by minimizing the information criterion. The Sturges rulev = 1 + \mathrm{log_{2}}(n)
,\mathrm{Log}_{10}
rulev = 10 \mathrm{log_{10}}(n)
or RootN rulev = 2 \sqrt{n}
can be applied to estimate the limiting numbers of bins or the rule of thumbk = \sqrt{n}
to guess the intermediate number of nearest neighbours. If, e.g.,K = c(10, 20, 40, 60)
and minimumIC
coincides, e.g.,40
, brackets are set to20
and60
and the golden section is applied to refine the minimum search. See alsokseq
for sequence of bins or nearest neighbours generation. The default value is"auto"
. ymin
:-
a vector of length
d
containing minimum observations. The default value isnumeric()
. ymax
:-
a vector of length
d
containing maximum observations. The default value isnumeric()
. ar
:-
acceleration rate
0 < a_{\mathrm{r}} \leq 1
. The default value is0.1
and in most cases does not have to be altered. Restraints
:-
a character giving the restraints type. One of
"rigid"
or default"loose"
. The rigid restraints are obsolete and applicable for well separated components only. Mode
:-
a character giving the mode type. One of
"all"
,"outliers"
or default"outliersplus"
.The modes are determined in decreasing order of magnitude from all observations ifMode = "all"
. IfMode = "outliers"
, the modes are determined in decreasing order of magnitude from outliers only. In the meantime, some outliers are reclassified as inliers. Finally, when all observations are inliers, the procedure is completed. IfMode = "outliersplus"
, the modes are determined in decreasing magnitude from the outliers only. In the meantime, some outliers are reclassified as inliers. Finally, if all observations are inliers, they are converted to outliers and the mode determination procedure is continued. w
:-
a list of vectors of length
c
containing component weightsw_{l}
summing to 1. Theta
:-
a list of lists each containing
c
parametric family typespdfl
. One of"normal"
,"lognormal"
,"Weibull"
,"gamma"
,"Gumbel"
,"binomial"
,"Poisson"
,"Dirac"
,"uniform"
or circular"vonMises"
defined for0 \leq y_{i} \leq 2 \pi
. Component parameterstheta1.l
follow the parametric family types. One of\mu_{il}
for normal, lognormal, Gumbel and von Mises distributions,\theta_{il}
for Weibull, gamma, binomial, Poisson and Dirac distributions anda
for uniform distribution. Component parameterstheta2.l
followtheta1.l
. One of\sigma_{il}
for normal, lognormal and Gumbel distributions,\beta_{il}
for Weibull and gamma distributions,p_{il}
for binomial distribution,\kappa_{il}
for von Mises distribution andb
for uniform distribution. Component parameterstheta3.l
followtheta2.l
. One of\xi_{il}
for Gumbel distribution. summary
:-
a data frame with additional information about dataset, preprocessing,
c_{\mathrm{max}}
,c_{\mathrm{min}}
, information criterion type,a_{\mathrm{r}}
, restraints type, mode type, optimalc
, optimalv
ork
,K
,y_{i0}
,y_{i\mathrm{min}}
,y_{i\mathrm{max}}
, optimalh_{i}
, information criterion\mathrm{IC}
, log likelihood\mathrm{log}\, L
and degrees of freedomM
. summary.EM
:-
a data frame with additional information about dataset, strategy for the EM algorithm
strategy
, variant of the EM algorithmvariant
, acceleration typeacceleration
, tolerancetolerance
, acceleration multilplieracceleration.multiplier
, maximum allowed number of iterationsmaximum.iterations
, number of iterations used for obtaining optimal solutionopt.iterations.nbr
and total number of iterations of the EM algorithmtotal.iterations.nbr
. pos
:-
position in the
summary
data frame at which log likelihood\mathrm{log}\, L
attains its maximum. opt.c
:-
a list of vectors containing numbers of components for optimal
v
for the histogram and the kernel density estimation or for optimal number of nearest neighboursk
for the k-nearest neighbour. opt.IC
:-
a list of vectors containing information criteria for optimal
v
for the histogram and the kernel density estimation or for optimal number of nearest neighboursk
for the k-nearest neighbour. opt.logL
:-
a list of vectors containing log likelihoods for optimal
v
for the histogram and the kernel density estimation or for optimal number of nearest neighboursk
for the k-nearest neighbour. opt.Dmin
:-
a list of vectors containing
D_{\mathrm{min}}
values for optimalv
for the histogram and the kernel density estimation or for optimal number of nearest neighboursk
for the k-nearest neighbour. opt.D
:-
a list of vectors containing totals of positive relative deviations for optimal
v
for the histogram and the kernel density estimation or for optimal number of nearest neighboursk
for the k-nearest neighbour. all.K
:-
a list of vectors containing all processed numbers of bins
v
for the histogram and the kernel density estimation or all processed numbers of nearest neighboursk
for the k-nearest neighbour. all.IC
:-
a list of vectors containing information criteria for all processed numbers of bins
v
for the histogram and the kernel density estimation or for all processed numbers of nearest neighboursk
for the k-nearest neighbour.
Author(s)
Marko Nagode