fit.full.GMCM {GMCM} | R Documentation |
Estimate GMCM parameters of the general model
Description
Estimates the parameters of general Gaussian mixture copula models (GMCM). The function finds the maximum likelihood estimate of a general GMCM with various optimization procedures. Note, all but the PEM methods provides the maximum likelihood estimate.
Usage
fit.full.GMCM(u, m, theta = choose.theta(u, m), method = c("NM",
"SANN", "L-BFGS", "L-BFGS-B", "PEM"), max.ite = 1000, verbose = TRUE,
...)
fit.general.GMCM(u, m, theta = choose.theta(u, m), method = c("NM",
"SANN", "L-BFGS", "L-BFGS-B", "PEM"), max.ite = 1000, verbose = TRUE,
...)
Arguments
u |
An |
m |
The number of components to be fitted. |
theta |
A list of parameters as defined in |
method |
A character vector of length |
max.ite |
The maximum number of iterations. If the |
verbose |
Logical. If |
... |
Arguments passed to the |
Details
The "L-BFGS-B"
method does not perform a transformation of
the parameters and uses box constraints as implemented in optim
.
Note that the many parameter configurations are poorly estimable or
directly unidentifiable.
fit.general.GMCM
is simply an alias of fit.full.gmcm
.
Value
A list of parameters formatted as described in rtheta
.
When method
equals "PEM"
, a list of extra information
(log-likelihood trace, the matrix of group probabilities, theta trace) is
added as an attribute called "extra".
Note
All the optimization procedures are strongly dependent on the initial values and other parameters (such as the cooling scheme for method SANN). Therefore it is advisable to apply multiple different initial parameters (and optimization routines) and select the best fit.
The choose.theta
itself chooses random a initialization.
Hence, the output when theta
is not directly supplied can vary.
See optim
for further details.
Author(s)
Anders Ellern Bilgrau <anders.ellern.bilgrau@gmail.com>
References
Li, Q., Brown, J. B. J. B., Huang, H., & Bickel, P. J. (2011). Measuring reproducibility of high-throughput experiments. The Annals of Applied Statistics, 5(3), 1752-1779. doi:10.1214/11-AOAS466
Tewari, A., Giering, M. J., & Raghunathan, A. (2011). Parametric Characterization of Multimodal Distributions with Non-gaussian Modes. 2011 IEEE 11th International Conference on Data Mining Workshops, 286-292. doi:10.1109/ICDMW.2011.135
See Also
Examples
set.seed(17)
sim <- SimulateGMCMData(n = 1000, m = 3, d = 2)
# Plotting simulated data
par(mfrow = c(1,2))
plot(sim$z, col = rainbow(3)[sim$K], main = "Latent process")
plot(sim$u, col = rainbow(3)[sim$K], main = "GMCM process")
# Observed data
uhat <- Uhat(sim$u)
# The model should be fitted multiple times using different starting estimates
start.theta <- choose.theta(uhat, m = 3) # Random starting estimate
res <- fit.full.GMCM(u = uhat, theta = start.theta,
method = "NM", max.ite = 3000,
reltol = 1e-2, trace = TRUE) # Note, 1e-2 is too big
# Confusion matrix
Khat <- apply(get.prob(uhat, theta = res), 1, which.max)
table("Khat" = Khat, "K" = sim$K) # Note, some components have been swapped
# Simulation from GMCM with the fitted parameters
simfit <- SimulateGMCMData(n = 1000, theta = res)
# As seen, the underlying latent process is hard to estimate.
# The clustering, however, is very good.
par(mfrow = c(2,2))
plot(simfit$z, col = simfit$K, main = "Model check 1\nSimulated GMM")
plot(simfit$u, col = simfit$K, main = "Model check 2\nSimulated GMCM")
plot(sim$u, col = Khat, main = "MAP clustering")