blca.em {BayesLCA}R Documentation

Bayesian Latent Class Analysis via an EM Algorithm

Description

Latent class analysis (LCA) attempts to find G hidden classes in binary data X. blca.em utilises an expectation-maximisation algorithm to find maximum a posteriori (map) estimates of the parameters.

Usage

blca.em(X, G, alpha = 1, beta = 1, delta = 1, 
	start.vals = c("single", "across"), counts.n = NULL,
	iter = 500, restarts = 5, verbose = TRUE, 
	sd = FALSE, se=sd, conv = 1e-06, small = 1e-100)

Arguments

X

The data matrix. This may take one of several forms, see data.blca.

G

The number of classes to run lca for.

alpha, beta

The prior values for the data conditional on group membership. These may take several forms: a single value, recycled across all groups and columns, a vector of length G or M (the number of columns in the data), or finally, a G \times M matrix specifying each prior value separately. Defaults to 1, i.e, a uniform prior, for each value.

delta

Prior values for the mixture components in model. Defaults to 1, i.e., a uniform prior. May be single or vector valued (of length G).

start.vals

Denotes how class membership is to be assigned during the initial step of the algorithm. Two character values may be chosen, "single", which randomly assigns data points exclusively to one class, or "across", which assigns class membership via runif. Alternatively, class membership may be pre-specified, either as a vector of class membership, or as a matrix of probabilities. Defaults to "single".

counts.n

If data patterns have already been counted, a data matrix consisting of each unique data pattern can be supplied to the function, in addition to a vector counts.n, which supplies the corresponding number of times each pattern occurs in the data.

iter

The maximum number of iterations that the algorithm runs over. Will stop early if the algorithm is deemed to converge.

restarts

restarts determines how many times the algorithm is run with different starting values. Parameter estimates from the run which achieved the highest log-posterior are returned. If starting values are supplied, these are used for the first run, after which random starting points are used. Defaults to 5.

verbose

Logical valued. If TRUE, the log-posterior from each run is printed.

sd

Specifies whether posterior standard deviation estimates should also be returned. If TRUE, calls to blca.em.sd.

se

Similarly to sd, specifies whether posterior standard deviation estimates should also be returned, however, its use is discouraged. Should always agree with sd. Retained for backwards compatability reasons. See ‘Note’.

conv

Convergence criteria, i.e., how small should the log-posterior increase become before the algorithm is deemed to have converged? Set relative to the size of the data matrix.

small

To ensure numerical stability a small constant is added to certain parameter estimates. Defaults to 1e-100.

Details

Regardless of the form of the data supplied to blca.em, it is internally converted to be of the form data.blca. In particular, this should be noted when supplying starting values: the object must be of either the same length or have the same number of rows as the number of unique observations in the dataset, as opposed to the total number.

Posterior standard deviations and convergence checks are calculated using blca.em.sd.

Value

A list of class "blca.em" is returned, containing:

call

The initial call passed to the function.

itemprob

The item probabilities, conditional on class membership.

classprob

The class probabilities.

Z

Estimate of class membership for each unique datapoint.

itemprob.sd

If returned, standard error estimates of the item probabilities.

classprob.sd

If returned, standard error estimates of the class probabilities.

logpost

The log-posterior of the estimated model.

BIC

The Bayesian Information Criterion for the estimated model.

AIC

Akaike's Information Criterion for the estimated model.

iter

The number of iterations required before convergence.

poststore

The value of the log-posterior for each iteration.

eps

The value for which the algorithm was deemed to have converged at.

counts

The number of times each unique datapoint point occured.

lpstarts

The log-posterior achieved after each of the multiple starts of the algorithm.

convergence

If posterior standard deviations are calculated, then the Hessian of the model is also checked to determine whether the algorithm has converged to at least a local maximum. The convergence status is calculated by an integer value: 1 denotes acceptable convergence, 2 denotes that it converged at a saddle point, 3 that the algorithm ended before it had satisfactorily converged and 4 denotes that at least one parameter value converged at a boundary value (i.e., a 1 or 0). 0 denotes that the algorithm converged satisfactorily but that the Hessian has not been checked.

prior

A list containing the prior values specified for the model.

sd

A logical value indicating whether standard error estimates were returned.

Note

Earlier versions of this function erroneously referred to posterior standard deviations as standard errors. This also extended to arguments supplied to and returned by the function, some of which are now returned with the corrected corrected suffix blca.em.sd (for standard deviation). For backwards compatability reasons, the earlier suffix .se has been retained as a returned argument.

Author(s)

Arthur White

References

Dempster AP, Laird NM, Rubin DB (1977). “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society. Series B (Methodological), 39(1), pp. 1–38. ISSN 00359246. doi:10.2307/2984875. URL http://dx.doi.org/10.2307/2984875.

See Also

blca,blca.em.sd, blca.boot, blca.vb

Examples

type1 <- c(0.8, 0.8, 0.2, 0.2)
type2 <- c(0.2, 0.2, 0.8, 0.8)
x <- rlca(1000, rbind(type1,type2), c(0.6,0.4))

fit <- blca.em(x, 2)
print(fit)
fit <- blca.em(x, 2, sd=TRUE) ##Returns posterior standard deviations
summary(fit)
plot(fit)

## Different starting values
fit <- blca.em(x, 2, start.vals="across")
xx <- data.blca(x)
fit <- blca.em(xx, 2, start.vals=sample(1:2, length(xx$counts) , replace=TRUE))

[Package BayesLCA version 1.9 Index]