R: Univariate Normal Mixture (UVNM) Model with Unknown Number of...

uvnm.rjmcmc {miscF}

R Documentation

Univariate Normal Mixture (UVNM) Model with Unknown Number of Components

Description

Estimate the parameters of an univariate normal mixture model including the number of components using the Reversible Jump MCMC method. It can be used for density estimation and/or classification.

Usage

  uvnm.rjmcmc(y, nsweep, kmax, k, w, mu, sigma2, Z,
              delta=1, xi=NULL, kappa=NULL, alpha=2,
              beta=NULL, g=0.2, h=NULL, verbose=TRUE)

Arguments

`y`	a vector of observations.
`nsweep`	number of sweeps. One sweep has six moves to update parameters including the number of components.
`kmax`	the maximum number of components.
`k`	initial value of number of components.
`w`	initial values of weights of different components.
`mu`	initial values of means of different components.
`sigma2`	initial values of variances of different components.
`Z`	initial values of allocations of all observations.
`delta`	the parameter of the prior distribution of `w`, the symmetric Dirichlet distribution. The default value is one.
`xi`	the mean of the prior distribution of means of components. By taking the default value `NULL`, it is set as the median of the data `y` internally.
`kappa`	the precision of the prior distribution of means of components. By taking the default value `NULL`, it is set as 1/R^2 internally, where R is the range of `y`.
`alpha`	the parameter shape of the prior distribution of precision of components. The default values is 2.
`beta`	the parameter rate of the prior distribution of precision of components. By taking the default value `NULL`, it is set as a number generated randomly from a gamma distribution with shape `g` and rate `h`.
`g`	the parameter shape of the gamma distribution for `beta`. The default value is 0.2.
`h`	the parameter rate of the gamma distribution for `beta`. By taking the default value `NULL`, it is set as 10/R^2 internally, where R is the range of `y`.
`verbose`	logical. If `TRUE`, then indicate the level of output after every 1000 sweeps. The default is `TRUE`.

Details

Estimate the parameters of a univariate normal mixture model with flexible number of components using the Reversible Jump MCMC method in Richardson and Green (1997).

Value

`k.save`	a vector of number of components.
`w.save`	weights of the UVNM, one component of the list per sweep.
`mu.save`	means of the UVNM, one component of the list per sweep.
`sigma2.save`	variances of the UVNM, one component of the list per sweep.
`Z.save`	a matrix of allocation, one column per sweep.

Note

The error in equation (12) of Richardson and Green (1997) was corrected based on Richardson and Green (1998).

References

Sylvia Richardson and Peter J. Green (1997) On Bayesian Analysis of Mixtures with an Unknown Number of Components JRSSB vol. 59, no. 4 731-792

Sylvia Richardson and Peter J. Green (1998) Corrigendum: On Bayesian Analysis of Mixtures with an Unknown Number of Components JRSSB vol. 60, no. 3 661

Examples


## Not run: 
    require(mixAK)
    data(Acidity)
    y <- Acidity
    w <- c(0.50, 0.17, 0.33)
    mu <- c(4, 5, 6)
    sigma2 <- c(0.08, 0.10, 0.14)
    Z <- do.call(cbind, lapply(1:3, function(i)
                                    w[i]*dnorm(y, mu[i], sqrt(sigma2[i]))))
    Z <- apply(Z, 1, function(x) which(x==max(x))[1])
     
    result <- uvnm.rjmcmc(y, nsweep=200000, kmax=30, k=3,
                          w, mu, sigma2, Z)

    ksave <- result$k.save
    round(table(ksave[-(1:100000)])/100000,4)

    #conditional density estimation
    focus.k <- 3
    pick.k <- which(ksave==focus.k)
    w <- unlist(result$w.save[pick.k])
    mu <- unlist(result$mu.save[pick.k])
    sigma2 <- unlist(result$sigma2.save[pick.k])
    den.estimate <- rep(w, each=length(y)) *
                    dnorm(rep(y, length(w)), mean=rep(mu, each=length(y)),
                          sd=rep(sqrt(sigma2), each=length(y)))
    den.estimate <- rowMeans(matrix(den.estimate, nrow=length(y)))*focus.k

    #within-sample classification
    class <- apply(result$Z.save[,pick.k], 1,
                                function(x) c(sum(x==1), sum(x==2), sum(x==3)))
    class <- max.col(t(class))

    #visualize the results
    hist(y, freq=FALSE, breaks=20, axes=FALSE, ylim=c(-0.3, 1),
         main="Density Estimation and Classification", ylab="")
    axis(2, at=c(-(3:1)/10, seq(0,10,2)/10), labels=c(3:1, seq(0,10,2)/10),
         font=2)
    lines(sort(y), den.estimate[order(y)], col="red")
    for(i in 1:3){
        points(y[class==i], rep(-i/10, length(y[class==i])), col=i, pch=i)
    }
    mtext("Density", 2, at=0.5, line=2)
    mtext("Classification", 2, at=-0.15, line=2)

## End(Not run)

[Package miscF version 0.1-5 Index]