R: Latent Trait Model - Latent Variable Model for Binary Data

ltm {ltm}

R Documentation

Latent Trait Model - Latent Variable Model for Binary Data

Description

Fit a latent trait model under the Item Response Theory (IRT) approach.

Usage

ltm(formula, constraint = NULL, IRT.param, start.val, 
    na.action = NULL, control = list())

Arguments

`formula`	a two-sided formula providing the responses data matrix and describing the latent structure. In the left side of `formula` either a `data.frame` (that will be converted to a numeric matrix using `data.matrix()`) or a numeric `matrix` of manifest variables must be supplied. In the right side of `formula` only two latent variables are allowed with codenames `z1`, `z2`. Interaction and quadratic terms can also be used (see Details and Examples for more info).
`constraint`	a three-column numeric matrix with at most `pq - 1` rows (where `p` is the number of items and `q` the number of latent components plus the intercept), specifying fixed-value constraints. The first column represents the item (i.e., `1` denotes the first item, `2` the second, etc.), the second column represents the component of the latent structure (i.e., `1` denotes the intercept `\beta_{0i}`, `2` the loadings of the first factor `\beta_ {1i}`, etc.) and the third column denotes the value at which the corresponding parameter should be fixed. See Details and Examples for more info.
`IRT.param`	logical; if `TRUE` then the coefficients' estimates for the two-parameter logistic model are reported under the usual IRT parameterization. See Details for more info.
`start.val`	the character string "random" or a numeric matrix supplying starting values with `p` rows and `q` columns, with `p` denoting the number of items, and `q` denoting the number of terms in the right-hand side of `formula`. If `NULL` starting values are automatically computed. If "random", random starting values are used. If a matrix, then depending on the latent structure specified in `formula`, the first column should contain `\beta_{0i}`, the second `\beta_{1i}`, the third `\beta_{2i}`, and the remaing columns `\beta_{nl,i}` (see Details)

na.action

the na.action to be used on the data frame in the left side of formula. In case of missing data, if na.action = NULL the model uses the available cases, i.e., it takes into account the observed part of sample units with missing values (valid under MAR mechanisms if the model is correctly specified). If you want to apply a complete case analysis then use na.action = na.exclude.

control

a list of control values,

iter.em: the number of EM iterations. Default 40.
iter.qN: the number of quasi-Newton iterations. Default 150.
GHk: the number of Gauss-Hermite quadrature points. Default 15.
method: the optimization method to be used in optim(). Default "BFGS".
verbose: logical; if TRUE info about the optimization procedure are printed.

Details

The latent trait model is the analogue of the factor analysis model for binary observed data. The model assumes that the dependencies between the observed response variables (known as items) can be interpreted by a small number of latent variables. The model formulation is under the IRT approach; in particular,

\log\left(\frac{\pi_{i}}{1-\pi_{i}}\right)=\beta_{0i} + \beta_{1i}z_1 + \beta_{2i}z_2,

where \pi_i is the the probability of a positive response in the ith item, \beta_{i0} is the easiness parameter, \beta_{ij} (j=1,2) are the discrimination parameters and z_1, z_2 denote the two latent variables.

The usual form of the latent trait model assumes linear latent variable effects (Bartholomew and Knott, 1999; Moustaki and Knott, 2000). ltm() fits the linear one- and two-factor models but also provides extensions described by Rizopoulos and Moustaki (2006) to include nonlinear latent variable effects. These are incorporated in the linear predictor of the model, i.e.,

\log\left (\frac{\pi_{i}}{1-\pi_{i}}\right)=\beta_{0i} + \beta_{1i}z_1 + \beta_{2i}z_2 + \beta_{nl}^tf(z_1, z_2),

where f(z_1, z_2) is a function of z_1 and z_2 (e.g., f(z_1, z_2) = z_1z_2, f(z_1, z_2) = z_1^2, etc.) and \beta_{nl} is a matrix of nonlinear terms parameters (look also at the Examples).

If IRT.param = TRUE, then the parameters estimates for the two-parameter logistic model (i.e., the model with one factor) are reported under the usual IRT parameterization, i.e.,

\log\left(\frac{\pi_i}{1-\pi_i}\right) = \beta_{1i} (z - \beta_{0i}^*).

The linear two-factor model is unidentified under orthogonal rotations on the factors' space. To achieve identifiability you can fix the value of one loading using the constraint argument.

The parameters are estimated by maximizing the approximate marginal log-likelihood under the conditional independence assumption, i.e., conditionally on the latent structure the items are independent Bernoulli variates under the logit link. The required integrals are approximated using the Gauss-Hermite rule. The optimization procedure used is a hybrid algorithm. The procedure initially uses a moderate number of EM iterations (see control argument iter.em) and then switches to quasi-Newton (see control arguments method and iter.qN) iterations until convergence.

Value

An object of class ltm with components,

`coefficients`	a matrix with the parameter values at convergence. These are always the estimates of `\beta_{li}, l = 0, 1, \ldots` parameters, even if `IRT.param = TRUE`.
`log.Lik`	the log-likelihood value at convergence.
`convergence`	the convergence identifier returned by `optim()`.
`hessian`	the approximate Hessian matrix at convergence returned by `optim()`.
`counts`	the number of function and gradient evaluations used by the quasi-Newton algorithm.
`patterns`	a list with two components: (i) `X`: a numeric matrix that contains the observed response patterns, and (ii) `obs`: a numeric vector that contains the observed frequencies for each observed response pattern.
`GH`	a list with two components used in the Gauss-Hermite rule: (i) `Z`: a numeric matrix that contains the abscissas, and (ii) `GHw`: a numeric vector that contains the corresponding weights.
`max.sc`	the maximum absolute value of the score vector at convergence.
`ltst`	a list describing the latent structure.
`X`	a copy of the response data matrix.
`control`	the values used in the `control` argument.
`IRT.param`	the value of the `IRT.param` argument.
`constraint`	`if(!is.null(constraint))`, then it contains the value of the `constraint` argument.
`call`	the matched call.

Warning

In case the Hessian matrix at convergence is not positive definite, try to re-fit the model; ltm() will use new random starting values.

The inclusion of nonlinear latent variable effects produces more complex likelihood surfaces which might possess a number of local maxima. To ensure that the maximum likelihood value has been reached re-fit the model a number of times (simulations showed that usually 10 times are adequate to ensure global convergence).

Conversion of the parameter estimates to the usual IRT parameterization works only for the two-parameter logistic model.

Note

In the case of the one-factor model, the optimization algorithm works under the constraint that the discrimination parameter of the first item \beta_{11} is always positive. If you wish to change its sign, then in the fitted model, say m, use m$coef[, 2] <- -m$coef[, 2].

When the coefficients' estimates are reported under the usual IRT parameterization (i.e., IRT.param = TRUE), their standard errors are calculated using the Delta method.

Author(s)

Dimitris Rizopoulos d.rizopoulos@erasmusmc.nl

References

Baker, F. and Kim, S-H. (2004) Item Response Theory, 2nd ed. New York: Marcel Dekker.

Bartholomew, D. and Knott, M. (1999) Latent Variable Models and Factor Analysis, 2nd ed. London: Arnold.

Bartholomew, D., Steel, F., Moustaki, I. and Galbraith, J. (2002) The Analysis and Interpretation of Multivariate Data for Social Scientists. London: Chapman and Hall.

Moustaki, I. and Knott, M. (2000) Generalized latent trait models. Psychometrika, 65, 391–411.

Rizopoulos, D. (2006) ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25. URL doi: 10.18637/jss.v017.i05

Rizopoulos, D. and Moustaki, I. (2008) Generalized latent variable models with nonlinear effects. British Journal of Mathematical and Statistical Psychology, 61, 415–438.

Examples

## The two-parameter logistic model for the WIRS data
## with the constraint that (i) the easiness parameter 
## for the 1st item equals 1 and (ii) the discrimination
## parameter for the 6th item equals -0.5

ltm(WIRS ~ z1, constr = rbind(c(1, 1, 1), c(6, 2, -0.5)))


## One-factor and a quadratic term
## using the Mobility data
ltm(Mobility ~ z1 + I(z1^2))

## Two-factor model with an interaction term
## using the WIRS data
ltm(WIRS ~ z1 * z2)


## The two-parameter logistic model for the Abortion data 
## with 20 quadrature points and 20 EM iterations;
## report results under the usual IRT parameterization
ltm(Abortion ~ z1, control = list(GHk = 20, iter.em = 20))

[Package ltm version 1.2-0 Index]