R: Bayesian Latent Class Analysis via a variational Bayes...

blca.vb {BayesLCA}

R Documentation

Bayesian Latent Class Analysis via a variational Bayes algorithm

Description

Latent class analysis (LCA) attempts to find G hidden classes in binary data X. blca.vb uses a variational EM algorithm to find the distribution which best approximates the parameters' true distribution.

Usage

blca.vb(X, G, alpha = 1, beta = 1, delta = 1, 
	start.vals = c("single", "across"), counts.n = NULL, 
        iter = 500, restarts = 1, verbose = TRUE, conv = 1e-06, 
	small = 1e-100)

Arguments

`X`	The data matrix. This may take one of several forms, see `data.blca`.
`G`	The number of classes to run lca for.
`alpha`, `beta`	The prior values for the data conditional on group membership. These may take several forms: a single value, recycled across all groups and columns, a vector of length G or M (the number of columns in the data), or finally, a G x M matrix specifying each prior value separately. Defaults to 1, i.e, a uniform prior, for each value.
`delta`	Prior values for the mixture components in model. Defaults to 1, i.e., a uniform prior. May be single or vector valued (of length G).
`start.vals`	Denotes how class membership is to be assigned during the initial step of the algorithm. Two character values may be chosen, "single", which randomly assigns data points exclusively to one class, or "across", which assigns class membership via `runif`. Alternatively, class membership may be pre-specified, either as a vector of class membership, or as a matrix of probabilities. Defaults to "single".
`counts.n`	If data patterns have already been counted, a data matrix consisting of each unique data pattern can be supplied to the function, in addition to a vector counts.n, which supplies the corresponding number of times each pattern occurs in the data.
`iter`	The maximum number of iterations that the algorithm runs over. Will stop earlier if the algorithm converges.
`restarts`	`restarts` determines how many times the algorithm is run with different starting values. Parameter estimates from the run which achieved the highest log-posterior are returned. If starting values are supplied, these are used for the first run, after which random starting points are used. Defaults to 1.
`verbose`	Logical valued. If TRUE, the log-posterior from each run is printed.
`conv`	Convergence criteria, i.e., how small should the log-posterior increase become before the algorithm is deemed to have converged? Set relative to the size of the data matrix.
`small`	To ensure numerical stability a small constant is added to certain parameter estimates. Defaults to 1e-100.

Details

The variational Bayes method approximates the posterior using as a product of independent distributions. Parameters are then estimated for this approximate distribution using a variational EM algorithm. This method has a tendency to underestimate parameter's variance; as such the standard error and density estimates should be interpreted with caution.

While it is worth starting the algorithm from multiple starting points, variational algorithms have less of a tendency to cpnverge at saddle point or sub-optimal local maxima.

Value

A list of class "blca.vb" is returned, containing:

`call`	The initial call passed to the function.
`itemprob`	The item probabilities, conditional on class membership.
`classprob`	The class probabilities.
`itemprob.sd`	Posterior standard deviation estimates of the item probabilities.
`classprob.sd`	Posterior standard deviation estimates of the class probabilities.
`parameters`	A list containing posterior parameter values for item and class probabilities, which are assumed to follow beta and Dirichlet distributions respectively.
`Z`	Estimate of class membership for each unique datapoint.
`LB`	The lower bound estimate of the log-posterior of the estimated model.
`lbstore`	The value of the lower bound estimate for each iteration.
`iter`	The number of iterations required before convergence.
`eps`	The amount that the lower bound increased at the final iteration of the algorithm's run.
`counts`	The number of times each unique datapoint point occured.
`prior`	A list containing the prior values specified for the model.

Note

Variational Bayes approximations, are known to often underestimate the standard errors of the parameters under investigation, so caution is advised when checking their values.

Earlier versions of this function erroneously referred to posterior standard deviations as standard errors. This also extended to arguments supplied to and returned by the function, some of which are now returned with the corrected corrected suffix blca.em.sd (for standard deviation). For backwards compatability reasons, the earlier suffix .se has been retained as a returned argument.

Author(s)

Arthur White

References

Ormerod J, Wand M (2010). “Explaining Variational Approximations.” The American Statistician, 64(2), 140-153.

Examples

type1 <- c(0.8, 0.8, 0.2, 0.2)
type2 <- c(0.2, 0.2, 0.8, 0.8)
x<- rlca(1000, rbind(type1,type2), c(0.6,0.4))

fit <- blca.vb(x, 2)
print(fit)
summary(fit)
par(mfrow=c(3,2))
plot(fit)
par(mfrow=c(1,1))

data(Alzheimer)
sj <- blca.vb(Alzheimer, 10, delta=1/10)
sj$classprob    ##Empty Groups

[Package BayesLCA version 1.9 Index]