REBMIX-class {rebmix} | R Documentation |
Class "REBMIX"
Description
Object of class REBMIX
.
Objects from the Class
Objects can be created by calls of the form new("REBMIX", ...)
. Accessor methods for the slots are a.Dataset(x = NULL, pos = 0)
,
a.Preprocessing(x = NULL)
, a.cmax(x = NULL)
, a.cmin(x = NULL)
, a.Criterion(x = NULL)
, a.Variables(x = NULL)
,
a.pdf(x = NULL)
, a.theta1(x = NULL)
, a.theta2(x = NULL)
, a.theta3(x = NULL)
, a.K(x = NULL)
, a.ymin(x = NULL)
,
a.ymax(x = NULL)
, a.ar(x = NULL)
, a.Restraints(x = NULL)
, a.Mode(x = NULL)
, a.w(x = NULL, pos = 0)
, a.Theta(x = NULL, pos = 0)
, a.summary(x = NULL, col.name = character(), pos = 0)
,
a.summary.EM(x = NULL, col.name = character(), pos = 0)
, a.pos(x = NULL)
,
a.opt.c(x = NULL)
, a.opt.IC(x = NULL)
, a.opt.logL(x = NULL)
, a.opt.Dmin(x = NULL)
, a.opt.D(x = NULL)
, a.all.K(x = NULL)
, a.all.IC(x = NULL)
,
a.theta1.all(x = NULL, pos = 1)
, a.theta2.all(x = NULL, pos = 1)
and a.theta3.all(x = NULL, pos = 1)
, where x
, pos
and col.name
stand for an object of class REBMIX
,
a desired slot item and a desired column name, respectively.
Slots
Dataset
:-
a list of length
of data frames or objects of class
Histogram
. Data frames should have sizecontaining d-dimensional datasets. Each of the
columns represents one random variable. Numbers of observations
equal the number of rows in the datasets.
Preprocessing
:-
a character vector giving the preprocessing types. One of
"histogram"
,
"kernel density estimation"
or"k-nearest neighbour"
. cmax
:-
maximum number of components
. The default value is
15
. cmin
:-
minimum number of components
. The default value is
1
. Criterion
:-
a character giving the information criterion type. One of default Akaike
"AIC"
,"AIC3"
,"AIC4"
or"AICc"
, Bayesian"BIC"
, consistent Akaike"CAIC"
, Hannan-Quinn"HQC"
, minimum description length"MDL2"
or"MDL5"
, approximate weight of evidence"AWE"
, classification likelihood"CLC"
, integrated classification likelihood"ICL"
or"ICL-BIC"
, partition coefficient"PC"
, total of positive relative deviations"D"
or sum of squares error"SSE"
. Variables
:-
a character vector of length
containing types of variables. One of
"continuous"
or"discrete"
. pdf
:-
a character vector of length
containing continuous or discrete parametric family types. One of
"normal"
,"lognormal"
,"Weibull"
,"gamma"
,"Gumbel"
,"binomial"
,"Poisson"
,"Dirac"
,"uniform"
or"vonMises"
. theta1
:-
a vector of length
containing initial component parameters. One of
for
"binomial"
distribution. theta2
:-
a vector of length
containing initial component parameters. Currently not used.
theta3
:-
a vector of length
containing initial component parameters. One of
for
"Gumbel"
distribution. K
:-
a character or a vector or a list of vectors containing numbers of bins
for the histogram and the kernel density estimation or numbers of nearest neighbours
for the k-nearest neighbour. There is no genuine rule to identify
or
. Consequently, the REBMIX algorithm identifies them from the set
K
of input values by minimizing the information criterion. The Sturges rule,
rule
or RootN rule
can be applied to estimate the limiting numbers of bins or the rule of thumb
to guess the intermediate number of nearest neighbours. If, e.g.,
K = c(10, 20, 40, 60)
and minimumIC
coincides, e.g.,40
, brackets are set to20
and60
and the golden section is applied to refine the minimum search. See alsokseq
for sequence of bins or nearest neighbours generation. The default value is"auto"
. ymin
:-
a vector of length
containing minimum observations. The default value is
numeric()
. ymax
:-
a vector of length
containing maximum observations. The default value is
numeric()
. ar
:-
acceleration rate
. The default value is
0.1
and in most cases does not have to be altered. Restraints
:-
a character giving the restraints type. One of
"rigid"
or default"loose"
. The rigid restraints are obsolete and applicable for well separated components only. Mode
:-
a character giving the mode type. One of
"all"
,"outliers"
or default"outliersplus"
.The modes are determined in decreasing order of magnitude from all observations ifMode = "all"
. IfMode = "outliers"
, the modes are determined in decreasing order of magnitude from outliers only. In the meantime, some outliers are reclassified as inliers. Finally, when all observations are inliers, the procedure is completed. IfMode = "outliersplus"
, the modes are determined in decreasing magnitude from the outliers only. In the meantime, some outliers are reclassified as inliers. Finally, if all observations are inliers, they are converted to outliers and the mode determination procedure is continued. w
:-
a list of vectors of length
containing component weights
summing to 1.
Theta
:-
a list of lists each containing
parametric family types
pdfl
. One of"normal"
,"lognormal"
,"Weibull"
,"gamma"
,"Gumbel"
,"binomial"
,"Poisson"
,"Dirac"
,"uniform"
or circular"vonMises"
defined for. Component parameters
theta1.l
follow the parametric family types. One offor normal, lognormal, Gumbel and von Mises distributions,
for Weibull, gamma, binomial, Poisson and Dirac distributions and
for uniform distribution. Component parameters
theta2.l
followtheta1.l
. One offor normal, lognormal and Gumbel distributions,
for Weibull and gamma distributions,
for binomial distribution,
for von Mises distribution and
for uniform distribution. Component parameters
theta3.l
followtheta2.l
. One offor Gumbel distribution.
summary
:-
a data frame with additional information about dataset, preprocessing,
,
, information criterion type,
, restraints type, mode type, optimal
, optimal
or
,
,
,
,
, optimal
, information criterion
, log likelihood
and degrees of freedom
.
summary.EM
:-
a data frame with additional information about dataset, strategy for the EM algorithm
strategy
, variant of the EM algorithmvariant
, acceleration typeacceleration
, tolerancetolerance
, acceleration multilplieracceleration.multiplier
, maximum allowed number of iterationsmaximum.iterations
, number of iterations used for obtaining optimal solutionopt.iterations.nbr
and total number of iterations of the EM algorithmtotal.iterations.nbr
. pos
:-
position in the
summary
data frame at which log likelihoodattains its maximum.
opt.c
:-
a list of vectors containing numbers of components for optimal
for the histogram and the kernel density estimation or for optimal number of nearest neighbours
for the k-nearest neighbour.
opt.IC
:-
a list of vectors containing information criteria for optimal
for the histogram and the kernel density estimation or for optimal number of nearest neighbours
for the k-nearest neighbour.
opt.logL
:-
a list of vectors containing log likelihoods for optimal
for the histogram and the kernel density estimation or for optimal number of nearest neighbours
for the k-nearest neighbour.
opt.Dmin
:-
a list of vectors containing
values for optimal
for the histogram and the kernel density estimation or for optimal number of nearest neighbours
for the k-nearest neighbour.
opt.D
:-
a list of vectors containing totals of positive relative deviations for optimal
for the histogram and the kernel density estimation or for optimal number of nearest neighbours
for the k-nearest neighbour.
all.K
:-
a list of vectors containing all processed numbers of bins
for the histogram and the kernel density estimation or all processed numbers of nearest neighbours
for the k-nearest neighbour.
all.IC
:-
a list of vectors containing information criteria for all processed numbers of bins
for the histogram and the kernel density estimation or for all processed numbers of nearest neighbours
for the k-nearest neighbour.
Author(s)
Marko Nagode