REBMIX-class {rebmix} | R Documentation |
Class "REBMIX"
Description
Object of class REBMIX
.
Objects from the Class
Objects can be created by calls of the form new("REBMIX", ...)
. Accessor methods for the slots are a.Dataset(x = NULL, pos = 0)
,
a.Preprocessing(x = NULL)
, a.cmax(x = NULL)
, a.cmin(x = NULL)
, a.Criterion(x = NULL)
, a.Variables(x = NULL)
,
a.pdf(x = NULL)
, a.theta1(x = NULL)
, a.theta2(x = NULL)
, a.theta3(x = NULL)
, a.K(x = NULL)
, a.ymin(x = NULL)
,
a.ymax(x = NULL)
, a.ar(x = NULL)
, a.Restraints(x = NULL)
, a.w(x = NULL, pos = 0)
, a.Theta(x = NULL, pos = 0)
, a.summary(x = NULL, col.name = character(), pos = 0)
,
a.summary.EM(x = NULL, col.name = character(), pos = 0)
, a.pos(x = NULL)
,
a.opt.c(x = NULL)
, a.opt.IC(x = NULL)
, a.opt.logL(x = NULL)
, a.opt.D(x = NULL)
, a.all.K(x = NULL)
, a.all.IC(x = NULL)
,
a.theta1.all(x = NULL, pos = 1)
, a.theta2.all(x = NULL, pos = 1)
and a.theta3.all(x = NULL, pos = 1)
, where x
, pos
and col.name
stand for an object of class REBMIX
,
a desired slot item and a desired column name, respectively.
Slots
Dataset
:-
a list of length
n_{\mathrm{D}}
of data frames or objects of classHistogram
. Data frames should have sizen \times d
containing d-dimensional datasets. Each of thed
columns represents one random variable. Numbers of observationsn
equal the number of rows in the datasets. Preprocessing
:-
a character vector giving the preprocessing types. One of
"histogram"
,
"kernel density estimation"
or"k-nearest neighbour"
. cmax
:-
maximum number of components
c_{\mathrm{max}} > 0
. The default value is15
. cmin
:-
minimum number of components
c_{\mathrm{min}} > 0
. The default value is1
. Criterion
:-
a character giving the information criterion type. One of default Akaike
"AIC"
,"AIC3"
,"AIC4"
or"AICc"
, Bayesian"BIC"
, consistent Akaike"CAIC"
, Hannan-Quinn"HQC"
, minimum description length"MDL2"
or"MDL5"
, approximate weight of evidence"AWE"
, classification likelihood"CLC"
, integrated classification likelihood"ICL"
or"ICL-BIC"
, partition coefficient"PC"
, total of positive relative deviations"D"
or sum of squares error"SSE"
. Variables
:-
a character vector of length
d
containing types of variables. One of"continuous"
or"discrete"
. pdf
:-
a character vector of length
d
containing continuous or discrete parametric family types. One of"normal"
,"lognormal"
,"Weibull"
,"gamma"
,"Gumbel"
,"binomial"
,"Poisson"
,"Dirac"
,"uniform"
or"vonMises"
. theta1
:-
a vector of length
d
containing initial component parameters. One ofn_{il} = \textrm{number of categories} - 1
for"binomial"
distribution. theta2
:-
a vector of length
d
containing initial component parameters. Currently not used. theta3
:-
a vector of length
d
containing initial component parameters. One of\xi_{il} \in \{-1, \textrm{NA}, 1\}
for"Gumbel"
distribution. K
:-
a character or a vector or a list of vectors containing numbers of bins
v
for the histogram and the kernel density estimation or numbers of nearest neighboursk
for the k-nearest neighbour. There is no genuine rule to identifyv
ork
. Consequently, the REBMIX algorithm identifies them from the setK
of input values by minimizing the information criterion. The Sturges rulev = 1 + \mathrm{log_{2}}(n)
,\mathrm{Log}_{10}
rulev = 10 \mathrm{log_{10}}(n)
or RootN rulev = 2 \sqrt{n}
can be applied to estimate the limiting numbers of bins or the rule of thumbk = \sqrt{n}
to guess the intermediate number of nearest neighbours. If, e.g.,K = c(10, 20, 40, 60)
and minimumIC
coincides, e.g.,40
, brackets are set to20
and60
and the golden section is applied to refine the minimum search. See alsokseq
for sequence of bins or nearest neighbours generation. The default value is"auto"
. ymin
:-
a vector of length
d
containing minimum observations. The default value isnumeric()
. ymax
:-
a vector of length
d
containing maximum observations. The default value isnumeric()
. ar
:-
acceleration rate
0 < a_{\mathrm{r}} \leq 1
. The default value is0.1
and in most cases does not have to be altered. Restraints
:-
a character giving the restraints type. One of
"rigid"
or default"loose"
. The rigid restraints are obsolete and applicable for well separated components only. w
:-
a list of vectors of length
c
containing component weightsw_{l}
summing to 1. Theta
:-
a list of lists each containing
c
parametric family typespdfl
. One of"normal"
,"lognormal"
,"Weibull"
,"gamma"
,"Gumbel"
,"binomial"
,"Poisson"
,"Dirac"
,"uniform"
or circular"vonMises"
defined for0 \leq y_{i} \leq 2 \pi
. Component parameterstheta1.l
follow the parametric family types. One of\mu_{il}
for normal, lognormal, Gumbel and von Mises distributions,\theta_{il}
for Weibull, gamma, binomial, Poisson and Dirac distributions anda
for uniform distribution. Component parameterstheta2.l
followtheta1.l
. One of\sigma_{il}
for normal, lognormal and Gumbel distributions,\beta_{il}
for Weibull and gamma distributions,p_{il}
for binomial distribution,\kappa_{il}
for von Mises distribution andb
for uniform distribution. Component parameterstheta3.l
followtheta2.l
. One of\xi_{il}
for Gumbel distribution. summary
:-
a data frame with additional information about dataset, preprocessing,
c_{\mathrm{max}}
,c_{\mathrm{min}}
, information criterion type,a_{\mathrm{r}}
, restraints type, optimalc
, optimalv
ork
,K
,y_{i0}
,y_{i\mathrm{min}}
,y_{i\mathrm{max}}
, optimalh_{i}
, information criterion\mathrm{IC}
, log likelihood\mathrm{log}\, L
and degrees of freedomM
. summary.EM
:-
a data frame with additional information about dataset, strategy for the EM algorithm
strategy
, variant of the EM algorithmvariant
, acceleration typeacceleration
, tolerancetolerance
, acceleration multilplieracceleration.multiplier
, maximum allowed number of iterationsmaximum.iterations
, number of iterations used for obtaining optimal solutionopt.iterations.nbr
and total number of iterations of the EM algorithmtotal.iterations.nbr
. pos
:-
position in the
summary
data frame at which log likelihood\mathrm{log}\, L
attains its maximum. opt.c
:-
a list of vectors containing numbers of components for optimal
v
for the histogram and the kernel density estimation or for optimal number of nearest neighboursk
for the k-nearest neighbour. opt.IC
:-
a list of vectors containing information criteria for optimal
v
for the histogram and the kernel density estimation or for optimal number of nearest neighboursk
for the k-nearest neighbour. opt.logL
:-
a list of vectors containing log likelihoods for optimal
v
for the histogram and the kernel density estimation or for optimal number of nearest neighboursk
for the k-nearest neighbour. opt.D
:-
a list of vectors containing totals of positive relative deviations for optimal
v
for the histogram and the kernel density estimation or for optimal number of nearest neighboursk
for the k-nearest neighbour. all.K
:-
a list of vectors containing all processed numbers of bins
v
for the histogram and the kernel density estimation or all processed numbers of nearest neighboursk
for the k-nearest neighbour. all.IC
:-
a list of vectors containing information criteria for all processed numbers of bins
v
for the histogram and the kernel density estimation or for all processed numbers of nearest neighboursk
for the k-nearest neighbour.
Author(s)
Marko Nagode