mix2normal {VGAM} | R Documentation |
Mixture of Two Univariate Normal Distributions
Description
Estimates the five parameters of a mixture of two univariate normal distributions by maximum likelihood estimation.
Usage
mix2normal(lphi = "logitlink", lmu = "identitylink", lsd =
"loglink", iphi = 0.5, imu1 = NULL, imu2 = NULL, isd1 =
NULL, isd2 = NULL, qmu = c(0.2, 0.8), eq.sd = TRUE,
nsimEIM = 100, zero = "phi")
Arguments
lphi , lmu , lsd |
Link functions for the parameters |
iphi |
Initial value for |
imu1 , imu2 |
Optional initial value for |
isd1 , isd2 |
Optional initial value for |
qmu |
Vector with two values giving the probabilities relating
to the sample quantiles for obtaining initial values for
|
eq.sd |
Logical indicating whether the two standard deviations should
be constrained to be equal. If |
nsimEIM |
|
zero |
May be an integer vector
specifying which linear/additive predictors are modelled as
intercept-only. If given, the value or values can be from the
set |
Details
The probability density function can be loosely written as
where is the probability an observation belongs
to the first group.
The parameters
and
are the
means, and
and
are the
standard deviations. The parameter
satisfies
.
The mean of
is
and this is returned as the fitted values.
By default, the five linear/additive predictors are
.
If
eq.sd = TRUE
then
is enforced.
Value
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions
such as vglm
,
and vgam
.
Warning
Numerical problems can occur and
half-stepping is not uncommon.
If failure to converge occurs, try inputting better initial
values,
e.g., by using iphi
,
qmu
,
imu1
,
imu2
,
isd1
,
isd2
,
etc.
This VGAM family function is experimental and should be used with care.
Note
Fitting this model successfully to data can be difficult due
to numerical problems and ill-conditioned data. It pays to
fit the model several times with different initial values and
check that the best fit looks reasonable. Plotting the results
is recommended. This function works better as
and
become more different.
Convergence can be slow, especially when the two component
distributions are not well separated.
The default control argument trace = TRUE
is to encourage
monitoring convergence.
Having eq.sd = TRUE
often makes the overall optimization
problem easier.
Author(s)
T. W. Yee
References
McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models. New York: Wiley.
Everitt, B. S. and Hand, D. J. (1981). Finite Mixture Distributions. London: Chapman & Hall.
See Also
uninormal
,
Normal
,
mix2poisson
.
Examples
## Not run: mu1 <- 99; mu2 <- 150; nn <- 1000
sd1 <- sd2 <- exp(3)
(phi <- logitlink(-1, inverse = TRUE))
rrn <- runif(nn)
mdata <- data.frame(y = ifelse(rrn < phi, rnorm(nn, mu1, sd1),
rnorm(nn, mu2, sd2)))
fit <- vglm(y ~ 1, mix2normal(eq.sd = TRUE), data = mdata)
# Compare the results
cfit <- coef(fit)
round(rbind('Estimated' = c(logitlink(cfit[1], inverse = TRUE),
cfit[2], exp(cfit[3]), cfit[4]),
'Truth' = c(phi, mu1, sd1, mu2)), digits = 2)
# Plot the results
xx <- with(mdata, seq(min(y), max(y), len = 200))
plot(xx, (1-phi) * dnorm(xx, mu2, sd2), type = "l", xlab = "y",
main = "red = estimate, blue = truth",
col = "blue", ylab = "Density")
phi.est <- logitlink(coef(fit)[1], inverse = TRUE)
sd.est <- exp(coef(fit)[3])
lines(xx, phi*dnorm(xx, mu1, sd1), col = "blue")
lines(xx, phi.est * dnorm(xx, Coef(fit)[2], sd.est), col = "red")
lines(xx, (1-phi.est)*dnorm(xx, Coef(fit)[4], sd.est), col="red")
abline(v = Coef(fit)[c(2,4)], lty = 2, col = "red")
abline(v = c(mu1, mu2), lty = 2, col = "blue")
## End(Not run)