movMF {movMF} | R Documentation |
Fit Mixtures of von Mises-Fisher Distributions
Description
Fit mixtures of von Mises-Fisher distributions.
Usage
movMF(x, k, control = list(), ...)
Arguments
x |
a numeric data matrix, with rows corresponding to observations. Standardized to unit row lengths if necessary. Can be a dense matrix, a simple triplet matrix (package slam), or a dgTMatrix (package Matrix). |
k |
an integer giving the desired number of mixture components (classes). |
control |
a list of control parameters. See Details. |
... |
a list of control parameters (overriding those specified
in |
Details
movMF
returns an object of class "movMF"
representing
the fitted mixture of von Mises-Fisher distributions model. Available
methods for such objects include coef
,
logLik
, print
and predict
.
predict
has an extra type
argument with possible
values "class_ids"
(default) and "memberships"
for
indicating hard or soft prediction, respectively.
The mixture of von Mises-Fisher distributions is fitted using EM
variants as specified by control option E
(specifying the
E-step employed), with possible values "softmax"
(default),
"hardmax"
or "stochmax"
where the first two implement
algorithms soft-moVMF and hard-moVMF of Banerjee et al (2005). For
"stochmax"
, class assignments are drawn from the posteriors for
each observation in the E-step as outlined as SEM in Celeux and
Govaert (1992). The stopping criterion for this algorithm is by
default changed to not check for convergence (logical control option
converge
), but to return the parameters with the maximum
likelihood encountered. E
may be abbreviated.
In the M-step, the parameters \theta
of the respective component
distributions are estimated via maximum likelihood, which is
accomplished by taking \theta
proportional to suitable weighted
sample means \bar{x}
, with length \kappa
solving the
equation A_d(\kappa) = \|\bar{x}\|
, where
A_d(\kappa) = I_{d/2}(\kappa) / I_{d/2-1}(\kappa)
with I
the modified Bessel function of the first kind. Via control argument
kappa
, one can specify how to (approximately) solve these
equations, and whether a common (possibly given) length \kappa
should be employed. If kappa
is a number, it gives a common
length to be employed. If it is a character string, it specifies the
method to be used for solving the \kappa
equation. The possible
methods are:
"Banerjee_et_al_2005"
uses the approximation of Banerjee et al (2005).
"Tanabe_et_al_2007"
uses the fixed-point iteration of Tanabe et al (2007) with starting point for
\kappa
in the interval established by Tanabe et al (2007) implied by a givenc
with values in [0, 2]. The default isc
= 1, the mid-point of the interval."Sra_2012"
uses two Newton steps as suggested in Sra (2012) starting in the approximation of Banerjee et al (2005).
"Song_et_al_2012"
uses two Halley steps as suggested in Song et al (2012) starting in the approximation of Banerjee et al (2005).
"uniroot"
uses a straightforward call to
uniroot
with the bounds established in Hornik and Grün (2014)."Newton"
uses a full Newton algorithm started in the approximation of Hornik and Grün (2014).
"Halley"
uses a full Halley algorithm started in the approximation of Hornik and Grün (2014).
"hybrid"
implements a combination of a derivative-based step (Newton or Halley) and a bisection step as outlined in Press et al. (2002). The derivative-based step can be specified via the argument
step
which expects a function performing this step. Currentlystep_Newton
andstep_Halley
(default) are available."Newton_Fourier"
(default)uses a variant of the Newton-Fourier method for strictly increasing concave functions as for example given in Atkinson (1989, pp. 62–64). Concavity can be established using Hornik and Grün (2013).
The lower-cased version of the given kappa
specification is
matched against the lower-cased names of the available methods using
pmatch
. Finally, to indicate using a common (but not
given) \kappa
for all component distributions, kappa
should be a list with element common = TRUE
(and optionally a
character string giving the estimation method).
Additional control parameters are as follows.
maxiter
an integer giving the maximal number of EM iterations to be performed. Default: 100.
reltol
the minimum relative improvement of the objective function per iteration. If improvement is less, the EM algorithm will stop under the assumption that no further significant improvement can be made. Defaults to
sqrt(.Machine$double.eps)
.ids
either a vector of class memberships or
TRUE
which implies that the class memberships are obtained from the attribute named"z"
of the input data; these class memberships are used for initializing the EM algorithm and the algorithm is stopped after the first iteration.start
a specification of the starting values to be employed. Can be a list of matrices giving the memberships of objects to components, or of vectors giving component ids (numbers from 1 to the given
k
). Can also be a character vector with elements"i"
(randomly pick component ids for the observations), or one of"p"
,"S"
or"s"
. The latter first determine component “prototypes”, and then determine an optimal “fuzzy” membership matrix from the implied cosine dissimilarities between observations and prototypes. Prototypes are obtained as follows: for"p"
, observations are randomly picked. For"S"
, one takes the first prototype to minimize total cosine dissimilarity to the observations, and then successively picks observations farthest away from the already picked prototypes. For"s"
, one takes a randomly chosen observation as the first prototype, and then proceeds as for"S"
.By default, initialization method
"p"
is used.If several starting values are specified, the EM algorithm is performed individually to each starting value, and the best solution found is returned.
nruns
an integer giving the number of EM runs to be performed. Default: 1. Only used if
start
is not given.minalpha
a numeric indicating the minimum prior probability. Components falling below this threshold are removed during the iteration. If
\ge 1
, this is taken as the minimal number of observations in a component. Default: 0.converge
a logical, if
TRUE
the EM algorithm is stopped if thereltol
criterion is met and the current parameter estimate is returned. IfFALSE
the EM algorithm is run formaxiter
iterations and the parametrizations with the maximum likelihood encountered during the EM algorithm is returned. Default:TRUE
, changed toFALSE
ifE="stochmax"
.verbose
a logical indicating whether to provide some output on algorithmic progress. Defaults to
getOption("verbose")
.
One popular application context of mixtures of von Mises-Fisher
distributions is text mining, where the data matrices are typically
very large and sparse. The provided implementation should be able to
handle such large corpora with reasonable efficiency by employing
suitable sparse matrix representations and computations. In addition,
straightforward computations of the normalizing constants in the von
Mises-Fisher densities (see movMF_distribution) by
directly employing the modified Bessel functions of the first kind are
computationally infeasible for large d
(dimension of the
observations) and/or values of the parameter lengths \kappa
.
Instead, we use suitably scaled hypergeometric-type power series for
computing (the logarithms of) the normalizing constants.
Value
An object of class "movMF"
representing the fitted mixture of
von Mises-Fisher distributions, which is a list containing at least
the following components:
theta |
a matrix with rows giving the fitted parameters of the mixture components. |
alpha |
a numeric vector with the fitted mixture probabilities. |
See vMF for the employed parametrization of the von Mises-Fisher distribution.
References
K. E. Atkinson (1989). An Introduction to Numerical Analysis. 2nd edition. John Wiley & Sons.
A. Banerjee, I. S. Dhillon, J. Ghosh, and S. Sra (2005). Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal of Machine Learning Research, 6, 1345–1382. https://jmlr.csail.mit.edu/papers/v6/banerjee05a.html.
G. Celeux, and G. Govaert (1992). A classification EM algorithm for clustering and two stochastic versions. Computational Statistics & Data Analysis, 14, 315–332. doi:10.1016/0167-9473(92)90042-E.
K. Hornik, and B. Grün (2013). Amos-type bounds for modified Bessel function ratios. Journal of Mathematical Analysis and Applications, 408(1), 91–101. doi:10.1016/j.jmaa.2013.05.070.
K. Hornik, and B. Grün (2014). On maximum likelihood estimation of the concentration parameter of von Mises-Fisher distributions. Computational Statistics, 29, 945–957. doi:10.1007/s00180-013-0471-0.
W. H. Press, S. A. Teukolsky, W. T. Vetterling and Brian P. Flannery (2002). Numerical Recipes in C: The Art of Scientific Computing. 2nd edition. Cambridge University Press.
H. Song, J. Liu, and G. Wang. High-order parameter approximation for von Mises-Fisher distributions. Applied Mathematics and Computation, 218, 11880–11890. doi:10.1016/j.amc.2012.05.050.
S. Sra (2012).
A short note on parameter approximation for von Mises-Fisher
distributions: and a fast implementation of I_s(x)
.
Computational Statistics, 27, 177–190.
doi:10.1007/s00180-011-0232-x.
A. Tanabe, K. Fukumizu, S. Oba, T. Takenouchi, and S. Ishii. Parameter estimation for von Mises-Fisher distributions. Computational Statistics, 22, 145–157. doi:10.1007/s00180-007-0030-7.
Examples
## Generate and fit a "small-mix" data set a la Banerjee et al.
mu <- rbind(c(-0.251, -0.968),
c(0.399, 0.917))
kappa <- c(4, 4)
theta <- kappa * mu
theta
alpha <- c(0.48, 0.52)
## Generate a sample of size n = 50 from the von Mises-Fisher mixture
## with the above parameters.
set.seed(123)
x <- rmovMF(50, theta, alpha)
## Fit a von Mises-Fisher mixture with the "right" number of components,
## using 10 EM runs.
set.seed(123)
y2 <- movMF(x, 2, nruns = 10)
## Inspect the fitted parameters:
y2
## Compare the fitted classes to the true ones:
table(True = attr(x, "z"), Fitted = predict(y2))
## To use a common kappa:
y2cv <- movMF(x, 2, nruns = 10, kappa = list(common = TRUE))
## To use a common kappa fixed to the true value of 4:
y2cf <- movMF(x, 2, nruns = 10, kappa = 4)
## Comparing solutions via BIC:
sapply(list(y2, y2cf, y2cv), BIC)
## Use a different kappa solver:
set.seed(123)
y2a <- movMF(x, 2, nruns = 10, kappa = "uniroot")
y2a