mix2normal {VGAM} | R Documentation |
Mixture of Two Univariate Normal Distributions
Description
Estimates the five parameters of a mixture of two univariate normal distributions by maximum likelihood estimation.
Usage
mix2normal(lphi = "logitlink", lmu = "identitylink", lsd =
"loglink", iphi = 0.5, imu1 = NULL, imu2 = NULL, isd1 =
NULL, isd2 = NULL, qmu = c(0.2, 0.8), eq.sd = TRUE,
nsimEIM = 100, zero = "phi")
Arguments
lphi , lmu , lsd |
Link functions for the parameters |
iphi |
Initial value for |
imu1 , imu2 |
Optional initial value for |
isd1 , isd2 |
Optional initial value for |
qmu |
Vector with two values giving the probabilities relating
to the sample quantiles for obtaining initial values for
|
eq.sd |
Logical indicating whether the two standard deviations should
be constrained to be equal. If |
nsimEIM |
|
zero |
May be an integer vector
specifying which linear/additive predictors are modelled as
intercept-only. If given, the value or values can be from the
set |
Details
The probability density function can be loosely written as
f(y) = \phi \, N(\mu_1,\sigma_1) +
(1-\phi) \, N(\mu_2, \sigma_2)
where \phi
is the probability an observation belongs
to the first group.
The parameters \mu_1
and \mu_2
are the
means, and \sigma_1
and \sigma_2
are the
standard deviations. The parameter \phi
satisfies
0 < \phi < 1
.
The mean of Y
is
\phi \mu_1 + (1-\phi) \mu_2
and this is returned as the fitted values.
By default, the five linear/additive predictors are
(logit(\phi),\mu_1,\log(\sigma_1),\mu_2,\log(\sigma_2))^T
.
If eq.sd = TRUE
then \sigma_1 = \sigma_2
is enforced.
Value
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions
such as vglm
,
and vgam
.
Warning
Numerical problems can occur and
half-stepping is not uncommon.
If failure to converge occurs, try inputting better initial
values,
e.g., by using iphi
,
qmu
,
imu1
,
imu2
,
isd1
,
isd2
,
etc.
This VGAM family function is experimental and should be used with care.
Note
Fitting this model successfully to data can be difficult due
to numerical problems and ill-conditioned data. It pays to
fit the model several times with different initial values and
check that the best fit looks reasonable. Plotting the results
is recommended. This function works better as \mu_1
and \mu_2
become more different.
Convergence can be slow, especially when the two component
distributions are not well separated.
The default control argument trace = TRUE
is to encourage
monitoring convergence.
Having eq.sd = TRUE
often makes the overall optimization
problem easier.
Author(s)
T. W. Yee
References
McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models. New York: Wiley.
Everitt, B. S. and Hand, D. J. (1981). Finite Mixture Distributions. London: Chapman & Hall.
See Also
uninormal
,
Normal
,
mix2poisson
.
Examples
## Not run: mu1 <- 99; mu2 <- 150; nn <- 1000
sd1 <- sd2 <- exp(3)
(phi <- logitlink(-1, inverse = TRUE))
rrn <- runif(nn)
mdata <- data.frame(y = ifelse(rrn < phi, rnorm(nn, mu1, sd1),
rnorm(nn, mu2, sd2)))
fit <- vglm(y ~ 1, mix2normal(eq.sd = TRUE), data = mdata)
# Compare the results
cfit <- coef(fit)
round(rbind('Estimated' = c(logitlink(cfit[1], inverse = TRUE),
cfit[2], exp(cfit[3]), cfit[4]),
'Truth' = c(phi, mu1, sd1, mu2)), digits = 2)
# Plot the results
xx <- with(mdata, seq(min(y), max(y), len = 200))
plot(xx, (1-phi) * dnorm(xx, mu2, sd2), type = "l", xlab = "y",
main = "red = estimate, blue = truth",
col = "blue", ylab = "Density")
phi.est <- logitlink(coef(fit)[1], inverse = TRUE)
sd.est <- exp(coef(fit)[3])
lines(xx, phi*dnorm(xx, mu1, sd1), col = "blue")
lines(xx, phi.est * dnorm(xx, Coef(fit)[2], sd.est), col = "red")
lines(xx, (1-phi.est)*dnorm(xx, Coef(fit)[4], sd.est), col="red")
abline(v = Coef(fit)[c(2,4)], lty = 2, col = "red")
abline(v = c(mu1, mu2), lty = 2, col = "blue")
## End(Not run)