tam.mml.3pl {TAM} | R Documentation |
3PL Structured Item Response Model in TAM
Description
This estimates a 3PL model with design matrices for item slopes and item intercepts. Discrete distributions of the latent variables are allowed which can be log-linearly smoothed.
Usage
tam.mml.3pl(resp, Y=NULL, group=NULL, formulaY=NULL, dataY=NULL, ndim=1,
pid=NULL, xsi.fixed=NULL, xsi.inits=NULL, xsi.prior=NULL,
beta.fixed=NULL, beta.inits=NULL, variance.fixed=NULL, variance.inits=NULL,
est.variance=TRUE, A=NULL, notA=FALSE, Q=NULL, Q.fixed=NULL, E=NULL,
gammaslope.des="2PL", gammaslope=NULL, gammaslope.fixed=NULL,
est.some.slopes=TRUE, gammaslope.max=9.99, gammaslope.constr.V=NULL,
gammaslope.constr.c=NULL, gammaslope.center.index=NULL, gammaslope.center.value=NULL,
gammaslope.prior=NULL, userfct.gammaslope=NULL, gammaslope.constr.Npars=0,
est.guess=NULL, guess=rep(0, ncol(resp)),
guess.prior=NULL, max.guess=0.5, skillspace="normal", theta.k=NULL,
delta.designmatrix=NULL, delta.fixed=NULL, delta.inits=NULL, pweights=NULL,
item.elim=TRUE, verbose=TRUE, control=list(), Edes=NULL )
## S3 method for class 'tam.mml.3pl'
summary(object,file=NULL,...)
## S3 method for class 'tam.mml.3pl'
print(x,...)
Arguments
resp |
Data frame with polytomous item responses |
Y |
A matrix of covariates in latent regression. Note that the intercept is automatically included as the first predictor. |
group |
An optional vector of group identifiers |
formulaY |
An R formula for latent regression. Transformations of predictors
in |
dataY |
An optional data frame with possible covariates |
ndim |
Number of dimensions (is not needed to determined by the user) |
pid |
An optional vector of person identifiers |
xsi.fixed |
A matrix with two columns for fixing |
xsi.inits |
A matrix with two columns (in the same way defined as in
|
xsi.prior |
Item-specific prior distribution for |
beta.fixed |
A matrix with three columns for fixing regression coefficients.
1st column: Index of |
beta.inits |
A matrix (same format as in |
variance.fixed |
An optional matrix. In case of a single group it is a matrix with three columns for fixing
entries in covariance matrix:
1st column: dimension 1, 2nd column: dimension 2,
3rd column: fixed value of covariance/variance.
In case of multiple groups, it is a matrix with four columns
1st column: group index (from |
variance.inits |
Initial covariance matrix in estimation. All matrix entries have to be
specified and this matrix is NOT in the same format like
|
est.variance |
Should the covariance matrix be estimated? This argument
applies to estimated item slopes in |
A |
An optional array of dimension |
notA |
An optional logical indicating whether all entries in
the |
Q |
An optional |
Q.fixed |
Optional |
E |
Optional design array for item slopes |
gammaslope.des |
Optional string indicating type of item slope parameter to be estimated.
|
gammaslope |
Initial or fixed vector of |
gammaslope.fixed |
An optional matrix containing fixed values in the |
est.some.slopes |
An optional logical indicating whether some item slopes should be estimated. |
gammaslope.max |
Value for absolute entries in |
gammaslope.constr.V |
An optional constraint matrix |
gammaslope.constr.c |
An optional constraint vector |
gammaslope.center.index |
Indices of |
gammaslope.center.value |
Specified values of sum of
subset of |
gammaslope.prior |
Item-specific prior distribution for |
userfct.gammaslope |
A user specified function for
constraints or transformations of the |
gammaslope.constr.Npars |
Number of constrained
|
est.guess |
An optional vector of integers indicating for which items a guessing parameter should be estimated. Same integers correspond to same estimated guessing parameters. A value of 0 denotes an item for which no guessing parameter is estimated. |
guess |
Fixed or initial guessing parameters |
guess.prior |
Item-specific prior distribution for guessing parameters |
max.guess |
Upper bound for guessing parameters |
skillspace |
Skill space: normal distribution ( |
theta.k |
A matrix of the |
delta.designmatrix |
A design matrix of the log-linear model for the skill space in case of a discrete
distribution ( |
delta.fixed |
Fixed |
delta.inits |
Optional initial matrix of |
pweights |
Optional vector of person weights. |
item.elim |
Optional logical indicating whether an item with has
only zero entries should be removed from the analysis. The default
is |
verbose |
Logical indicating whether output should
be printed during iterations. This argument replaces |
control |
See |
Edes |
Compact form of design matrix. This argument is only defined for convenience for models with random starting values to avoid recalculations. |
object |
Object of class |
file |
A file name in which the summary output will be written |
x |
Object of class |
... |
Further arguments to be passed |
Details
The item response model for item i
and category h
and no guessing
parameters can be written as
P( X_{i}=h | \bold{\theta} ) \propto \exp( \sum_d b_{ihd} \theta_d +
\sum_k a_{ih} \xi_k )
For item slopes, a linear decomposition is allowed
b_{ihd}=\sum_k e_{ihdk} \gamma_k
In case of a guessing parameter, the item response function is
P( X_{i}=h | \bold{\theta} )=c_i + ( 1 - c_i ) \cdot
( 1 + \exp( - \sum_d b_{ihd} \theta_d - \sum_k a_{ih} \xi_k ) )^{-1}
For possibilities of definitions of the design matrix E=(e_{ihdk})
see Formann (2007), for the strongly related linear logistic latent
class model see Formann (1992). Linear constraints on \gamma
can be specified by equations V \gamma=c
and using the arguments
gammaslope.constr.V
and gammaslope.constr.c
.
Like in tam.mml
, a multivariate linear regression
\bold{\theta}=Y \beta + \bold{\epsilon}
assuming a multivariate normal distribution on the residuals \bold{\epsilon}
can be specified (skillspace="normal"
).
Alternatively, a log-linear distribution of the skill classes P(\theta)
(skillspace="discrete"
)
is performed
\log P(\theta )=D_{ \delta } \delta
See Xu and
von Davier (2008). The design matrix D_{\delta}
can be specified in
delta.designmatrix
. The theta grid \bold{\theta}
of the skill space
can be defined in theta.k
.
Value
The same output as in tam.mml
and additional entries
delta |
Parameter vector |
gammaslope |
Estimated |
se.gammaslope |
Standard errors |
E |
Used design matrix |
Edes |
Used design matrix |
guess |
Estimated |
se.guess |
Standard errors |
Note
The implementation of the model builds on pieces work of Anton Formann. See http://www.antonformann.at/ for more information.
References
Formann, A. K. (1992). Linear logistic latent class analysis for polytomous data. Journal of the American Statistical Association, 87, 476-486. doi:10.2307/2290280
Formann, A. K. (2007). (Almost) Equivalence between conditional and mixture maximum likelihood estimates for some models of the Rasch type. In M. von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models (pp. 177-189). New York: Springer. doi:10.1007/978-0-387-49839-3_11
Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data. ETS Research Report ETS RR-08-27. Princeton, ETS. doi:10.1002/j.2333-8504.2008.tb02113.x
See Also
See also tam.mml
.
See the CDM::slca
function in the CDM package
for a similar method.
Examples
## Not run:
#############################################################################
# EXAMPLE 1: Dichotomous data | data.sim.rasch
#############################################################################
data(data.sim.rasch)
dat <- data.sim.rasch
# some control arguments
ctl.list <- list(maxiter=100) # increase the number of iterations in applications!
#*** Model 1: Rasch model, normal trait distribution
mod1 <- TAM::tam.mml.3pl(resp=dat, skillspace="normal", est.some.slopes=FALSE,
control=ctl.list)
summary(mod1)
#*** Model 2: Rasch model, discrete trait distribution
# choose theta grid
theta.k <- seq( -3, 3, len=7 ) # discrete theta grid distribution
# define symmetric trait distribution
delta.designmatrix <- matrix( 0, nrow=7, ncol=4 )
delta.designmatrix[4,1] <- 1
delta.designmatrix[c(3,5),2] <- 1
delta.designmatrix[c(2,6),3] <- 1
delta.designmatrix[c(1,7),4] <- 1
mod2 <- TAM::tam.mml.3pl(resp=dat, skillspace="discrete", est.some.slopes=FALSE,
theta.k=theta.k, delta.designmatrix=delta.designmatrix, control=ctl.list)
summary(mod2)
#*** Model 3: 2PL model
mod3 <- TAM::tam.mml.3pl(resp=dat, skillspace="normal", gammaslope.des="2PL",
control=ctl.list, est.variance=FALSE )
summary(mod3)
#*** Model 4: 3PL model
# estimate guessing parameters for items 3,7,9 and 12
I <- ncol(dat)
est.guess <- rep(0, I )
# set parameters 9 and 12 equal -> same integers
est.guess[ c(3,7,9,12) ] <- c( 1, 3, 2, 2 )
# starting values guessing parameter
guess <- .2*(est.guess > 0)
# estimate model
mod4 <- TAM::tam.mml.3pl(resp=dat, skillspace="normal", gammaslope.des="2PL",
control=ctl.list, est.guess=est.guess, guess=guess, est.variance=FALSE)
summary(mod4)
#--- specification in tamaan
tammodel <- "
LAVAAN MODEL:
F1=~ I1__I40
F1 ~~ 1*F1
I3 + I7 ?=g1
I9 + I12 ?=g912 * g1
"
mod4a <- TAM::tamaan( tammodel, resp=dat, control=list(maxiter=20))
summary(mod4a)
#*** Model 5: 3PL model, add some prior Beta distribution
guess.prior <- matrix( 0, nrow=I, ncol=2 )
guess.prior[ est.guess > 0, 1] <- 5
guess.prior[ est.guess > 0, 2] <- 17
mod5 <- TAM::tam.mml.3pl(resp=dat, skillspace="normal", gammaslope.des="2PL",
control=ctl.list, est.guess=est.guess, guess=guess, guess.prior=guess.prior)
summary(mod5)
#--- specification in tamaan
tammodel <- "
LAVAAN MODEL:
F1=~ I1__I40
F1 ~~ 1*F1
I3 + I7 ?=g1
I9 + I12 ?=g912 * g1
MODEL PRIOR:
g912 ~ Beta(5,17)
I3_guess ~ Beta(5,17)
I7_guess ~ Beta(5,17)
"
mod5a <- TAM::tamaan( tammodel, resp=dat, control=list(maxiter=20))
#*** Model 6: 2PL model with design matrix for item slopes
I <- 40 # number of items
D <- 1 # dimensions
maxK <- 2 # maximum number of categories
Ngam <- 13 # number of different slope parameters
E <- array( 0, dim=c(I,maxK,D,Ngam) )
# joint slope parameters for items 1 to 10, 11 to 20, 21 to 30
E[ 1:10, 2, 1, 2 ] <- 1
E[ 11:20, 2, 1, 1 ] <- 1
E[ 21:30, 2, 1, 3 ] <- 1
for (ii in 31:40){ E[ii,2,1,ii - 27 ] <- 1 }
# estimate model
mod6 <- TAM::tam.mml.3pl(resp=dat, control=ctl.list, E=E, est.variance=FALSE )
summary(mod6)
#*** Model 6b: Truncated normal prior distribution for slope parameters
gammaslope.prior <- matrix( 0, nrow=Ngam, ncol=4 )
gammaslope.prior[,1] <- 2 # mean
gammaslope.prior[,2] <- 10 # standard deviation
gammaslope.prior[,3] <- -Inf # lower bound
gammaslope.prior[ 4:13,3] <- 1.2
gammaslope.prior[,4] <- Inf # upper bound
# estimate model
mod6b <- TAM::tam.mml.3pl(resp=dat, E=E, est.variance=FALSE,
gammaslope.prior=gammaslope.prior, control=ctl.list )
summary(mod6b)
#*** Model 7: 2PL model with design matrix of slopes and slope constraints
Ngam <- dim(E)[4] # number of gamma parameters
# define two constraint equations
gammaslope.constr.V <- matrix( 0, nrow=Ngam, ncol=2 )
gammaslope.constr.c <- rep(0,2)
# set sum of first two xlambda entries to 1.8
gammaslope.constr.V[1:2,1] <- 1
gammaslope.constr.c[1] <- 1.8
# set sum of entries 4, 5 and 6 to 3
gammaslope.constr.V[4:6,2] <- 1
gammaslope.constr.c[2] <- 3
mod7 <- TAM::tam.mml.3pl(resp=dat, control=ctl.list, E=E, est.variance=FALSE,
gammaslope.constr.V=gammaslope.constr.V, gammaslope.constr.c=gammaslope.constr.c)
summary(mod7)
#**** Model 8: Located latent class Rasch model with estimated three skill points
# three classes of theta's are estimated
TP <- 3
theta.k <- diag(TP)
# because item difficulties are unrestricted, we define the sum of the estimated
# theta points equal to zero
Ngam <- TP # estimate three gamma loading parameters which are discrete theta points
E <- array( 0, dim=c(I,2,TP,Ngam) )
E[,2,1,1] <- E[,2,2,2] <- E[,2,3,3] <- 1
gammaslope.constr.V <- matrix( 1, nrow=3, ncol=1 )
gammaslope.constr.c <- c(0)
# initial gamma values
gammaslope <- c( -2, 0, 2 )
# estimate model
mod8 <- TAM::tam.mml.3pl(resp=dat, control=ctl.list, E=E, skillspace="discrete",
theta.k=theta.k, gammaslope=gammaslope, gammaslope.constr.V=gammaslope.constr.V,
gammaslope.constr.c=gammaslope.constr.c )
summary(mod8)
#*** Model 9: Multidimensional multiple group model
N <- nrow(dat)
I <- ncol(dat)
group <- c( rep(1,N/4), rep(2,N/4), rep(3,N/2) )
Q <- matrix(0,nrow=I,ncol=2)
Q[ 1:(I/2), 1] <- Q[ seq(I/2+1,I), 2] <- 1
# estimate model
mod9 <- TAM::tam.mml.3pl(resp=dat, skillspace="normal", est.some.slopes=FALSE,
group=group, Q=Q)
summary(mod9)
#############################################################################
# EXAMPLE 2: Polytomous data
#############################################################################
data( data.mg, package="CDM")
dat <- data.mg[1:1000, paste0("I",1:11)]
#*******************************************************
#*** Model 1: 1-dimensional 1PL estimation, normal skill distribution
mod1 <- TAM::tam.mml.3pl(resp=dat, skillspace="normal",
gammaslope.des="2PL", est.some.slopes=FALSE, est.variance=TRUE )
summary(mod1)
#*******************************************************
#*** Model 2: 1-dimensional 2PL estimation, discrete skill distribution
# define skill space
theta.k <- matrix( seq(-5,5,len=21) )
# allow skew skill distribution
delta.designmatrix <- cbind( 1, theta.k, theta.k^2, theta.k^3 )
# fix 13th xsi item parameter to zero
xsi.fixed <- cbind( 13, 0 )
# fix 10th slope paremeter to one
gammaslope.fixed <- cbind( 10, 1 )
# estimate model
mod2 <- TAM::tam.mml.3pl(resp=dat, skillspace="discrete", theta.k=theta.k,
delta.designmatrix=delta.designmatrix, gammaslope.des="2PL", xsi.fixed=xsi.fixed,
gammaslope.fixed=gammaslope.fixed)
summary(mod2)
#*******************************************************
#*** Model 3: 2-dimensional 2PL estimation, normal skill distribution
# define loading matrix
Q <- matrix(0,11,2)
Q[1:6,1] <- 1 # items 1 to 6 load on dimension 1
Q[7:11,2] <- 1 # items 7 to 11 load on dimension 2
# estimate model
mod3 <- TAM::tam.mml.3pl(resp=dat, gammaslope.des="2PL", Q=Q )
summary(mod3)
#############################################################################
# EXAMPLE 3: Dichotomous data with guessing
#############################################################################
#*** simulate data
set.seed(9765)
N <- 4000 # number of persons
I <- 20 # number of items
b <- seq( -1.5, 1.5, len=I )
guess <- rep(0, I )
guess.items <- c(6,11,16)
guess[ guess.items ] <- .33
library(sirt)
dat <- sirt::sim.raschtype( stats::rnorm(N), b=b, fixed.c=guess )
#*******************************************************
#*** Model 1: Difficulty + guessing model, i.e. fix slopes to 1
est.guess <- rep(0,I)
est.guess[ guess.items ] <- seq(1, length(guess.items) )
# define prior distribution
guess.prior <- matrix( cbind( 5, 17 ), I, 2, byrow=TRUE )
guess.prior[ ! est.guess, ] <- 0
# estimate model
mod1 <- TAM::tam.mml.3pl(resp=dat, guess=guess, est.guess=est.guess,
guess.prior=guess.prior, control=ctl.list,est.variance=TRUE,
est.some.slopes=FALSE )
summary(mod1)
#*******************************************************
#*** Model 2: estimate a joint guessing parameter
est.guess <- rep(0,I)
est.guess[ guess.items ] <- 1
# estimate model
mod2 <- TAM::tam.mml.3pl(resp=dat, guess=guess, est.guess=est.guess,
guess.prior=guess.prior, control=ctl.list,est.variance=TRUE,
est.some.slopes=FALSE )
summary(mod2)
#############################################################################
# EXAMPLE 4: Latent class model with two classes
# See slca Simulated Example 2 in the CDM package
#############################################################################
#*******************************************************
#*** simulate data
set.seed(9876)
I <- 7 # number of items
# simulate response probabilities
a1 <- round( stats::runif(I,0, .4 ),4)
a2 <- round( stats::runif(I, .6, 1 ),4)
N <- 1000 # sample size
# simulate data in two classes of proportions .3 and .7
N1 <- round(.3*N)
dat1 <- 1 * ( matrix(a1,N1,I,byrow=TRUE) > matrix( stats::runif( N1 * I), N1, I ) )
N2 <- round(.7*N)
dat2 <- 1 * ( matrix(a2,N2,I,byrow=TRUE) > matrix( stats::runif( N2 * I), N2, I ) )
dat <- rbind( dat1, dat2 )
colnames(dat) <- paste0("I", 1:I)
#*******************************************************
# estimation using tam.mml.3pl function
# define design matrices
TP <- 2 # two classes
theta.k <- diag(TP) # there are theta dimensions -> two classes
# The idea is that latent classes refer to two different "dimensions".
# Items load on latent class indicators 1 and 2, see below.
E <- array(0, dim=c(I,2,2,2*I) )
items <- colnames(dat)
dimnames(E)[[4]] <- c(paste0( colnames(dat), "Class", 1),
paste0( colnames(dat), "Class", 2) )
# items, categories, classes, parameters
# probabilities for correct solution
for (ii in 1:I){
E[ ii, 2, 1, ii ] <- 1 # probabilities class 1
E[ ii, 2, 2, ii+I ] <- 1 # probabilities class 2
}
# estimation command
mod1 <- TAM::tam.mml.3pl(resp=dat, E=E, control=list(maxit=20), skillspace="discrete",
theta.k=theta.k, notA=TRUE)
summary(mod1)
# compare simulated and estimated data
cbind( mod1$rprobs[,2,1], a2 ) # Simulated class 2
cbind( mod1$rprobs[,2,2], a1 ) # Simulated class 1
#*******************************************************
#** specification with tamaan
tammodel <- "
ANALYSIS:
TYPE=LCA;
NCLASSES(2); # 2 classes
NSTARTS(5,20); # 5 random starts with 20 iterations
LAVAAN MODEL:
F=~ I1__I7
"
mod1b <- TAM::tamaan( tammodel, resp=dat )
summary(mod1b)
# compare with mod1
logLik(mod1)
logLik(mod1b)
#############################################################################
# EXAMPLE 5: Located latent class model, Rasch model
# See slca Simulated Example 4 in the CDM package
#############################################################################
#*** simulate data
set.seed(487)
I <- 15 # I items
b1 <- seq( -2, 2, len=I) # item difficulties
N <- 2000 # number of persons
# simulate 4 theta classes
theta0 <- c( -2.5, -1, 0.3, 1.3 ) # skill classes
probs0 <- c( .1, .4, .2, .3 ) # skill class probabilities
TP <- length(theta0)
theta <- theta0[ rep(1:TP, round(probs0*N) ) ]
library(sirt)
dat <- sirt::sim.raschtype( theta, b1 )
colnames(dat) <- paste0("I",1:I)
#*******************************************************
#*** Model 1: Located latent class model with 4 classes
maxK <- 2
Ngam <- TP
E <- array( 0, dim=c(I, maxK, TP, Ngam ) )
dimnames(E)[[1]] <- colnames(dat)
dimnames(E)[[2]] <- paste0("Cat", 1:(maxK) )
dimnames(E)[[3]] <- paste0("Class", 1:TP)
dimnames(E)[[4]] <- paste0("theta", 1:TP)
# theta design
for (tt in 1:TP){ E[1:I, 2, tt, tt] <- 1 }
theta.k <- diag(TP)
# set eighth item difficulty to zero
xsi.fixed <- cbind( 8, 0 )
# initial gamma parameter
gammaslope <- seq( -1.5, 1.5, len=TP)
# estimate model
mod1 <- TAM::tam.mml.3pl(resp=dat, E=E, xsi.fixed=xsi.fixed,
control=list(maxiter=100), skillspace="discrete",
theta.k=theta.k, gammaslope=gammaslope)
summary(mod1)
# compare estimated and simulated theta class locations
cbind( mod1$gammaslope, theta0 )
# compare estimated and simulated latent class proportions
cbind( mod1$pi.k, probs0 )
#----- specification using tamaan
tammodel <- "
ANALYSIS:
TYPE=LOCLCA;
NCLASSES(4)
LAVAAN MODEL:
F=~ I1__I15
I8 | 0*t1
"
mod1b <- TAM::tamaan( tammodel, resp=dat )
summary(mod1b)
#############################################################################
# EXAMPLE 6: DINA model with two skills
# See slca Simulated Example 5 in the CDM package
#############################################################################
#*** simulate data
set.seed(487)
N <- 3000 # number of persons
# define Q-matrix
I <- 9 # 9 items
NS <- 2 # 2 skills
TP <- 4 # number of skill classes
Q <- scan(nlines=3, text=
"1 0 1 0 1 0
0 1 0 1 0 1
1 1 1 1 1 1",
)
Q <- matrix(Q, I, ncol=NS, byrow=TRUE)
# define skill distribution
alpha0 <- matrix( c(0,0,1,0,0,1,1,1), nrow=4,ncol=2,byrow=TRUE)
prob0 <- c( .2, .4, .1, .3 )
alpha <- alpha0[ rep( 1:TP, prob0*N),]
# define guessing and slipping parameters
guess <- round( stats::runif(I, 0, .4 ), 2 )
slip <- round( stats::runif(I, 0, .3 ), 2 )
# simulate data according to the DINA model
dat <- CDM::sim.din( q.matrix=Q, alpha=alpha, slip=slip, guess=guess )$dat
#*** Model 1: Estimate DINA model
# define E matrix which contains the anti-slipping parameters
maxK <- 2
Ngam <- I
E <- array( 0, dim=c(I, maxK, TP, Ngam ) )
dimnames(E)[[1]] <- colnames(dat)
dimnames(E)[[2]] <- paste0("Cat", 1:(maxK) )
dimnames(E)[[3]] <- c("S00","S10","S01","S11")
dimnames(E)[[4]] <- paste0( "antislip", 1:I )
# define anti-slipping parameters in E
for (ii in 1:I){
# define latent responses
latresp <- 1*( alpha0 %*% Q[ii,]==sum(Q[ii,]) )[,1]
# model slipping parameters
E[ii, 2, latresp==1, ii ] <- 1
}
# skill space definition
theta.k <- diag(TP)
gammaslope <- rep( qlogis( .8 ), I )
# estimate model
mod1 <- TAM::tam.mml.3pl(resp=dat, E=E, control=list(maxiter=100), skillspace="discrete",
theta.k=theta.k, gammaslope=gammaslope)
summary(mod1)
# compare estimated and simulated latent class proportions
cbind( mod1$pi.k, probs0 )
# compare estimated and simulated guessing parameters
cbind( mod1$rprobs[,2,1], guess )
# compare estimated and simulated slipping parameters
cbind( 1 - mod1$rprobs[,2,4], slip )
#############################################################################
# EXAMPLE 7: Mixed Rasch model with two classes
# See slca Simulated Example 3 in the CDM package
#############################################################################
#*** simulate data
set.seed(987)
library(sirt)
# simulate two latent classes of Rasch populations
I <- 15 # 6 items
b1 <- seq( -1.5, 1.5, len=I) # difficulties latent class 1
b2 <- b1 # difficulties latent class 2
b2[ c(4,7, 9, 11, 12, 13) ] <- c(1, -.5, -.5, .33, .33, -.66 )
b2 <- b2 - mean(b2)
N <- 3000 # number of persons
wgt <- .25 # class probability for class 1
# class 1
dat1 <- sirt::sim.raschtype( stats::rnorm( wgt*N ), - b1 )
# class 2
dat2 <- sirt::sim.raschtype( stats::rnorm( (1-wgt)*N, mean=1, sd=1.7), - b2 )
dat <- rbind( dat1, dat2 )
# The idea is that each grid point class x theta is defined as new
# dimension. If we approximate the trait distribution by 7 theta points
# and are interested in estimating 2 latent classes, then we need
# 7*2=14 dimensions.
#*** Model 1: Rasch model
# theta grid
theta.k1 <- seq( -5, 5, len=7 )
TT <- length(theta.k1)
#-- define theta design matrix
theta.k <- diag(TT)
#-- delta designmatrix
delta.designmatrix <- matrix( 0, TT, ncol=3 )
delta.designmatrix[, 1] <- 1
delta.designmatrix[, 2:3] <- cbind( theta.k1, theta.k1^2 )
#-- define loading matrix E
E <- array( 0, dim=c(I,2,TT,I + 1) ) # last parameter is constant 1
for (ii in 1:I){
E[ ii, 2, 1:TT, ii ] <- -1 # '-b' in '1*theta - b'
E[ ii, 2, 1:TT, I+1] <- theta.k1 # '1*theta' in '1*theta - b'
}
# initial gammaslope parameters
par1 <- stats::qlogis( colMeans( dat ) )
gammaslope <- c( par1, 1 )
# sum constraint of zero on item difficulties
gammaslope.constr.V <- matrix( 0, I+1, 1 )
gammaslope.constr.V[ 1:I, 1] <- 1 # Class 1
gammaslope.constr.c <- c(0)
# fixed gammaslope parameter
gammaslope.fixed <- cbind( I+1, 1 )
# estimate model
mod1 <- TAM::tam.mml.3pl(resp=dat1, E=E, skillspace="discrete",
theta.k=theta.k, delta.designmatrix=delta.designmatrix,
gammaslope=gammaslope, gammaslope.constr.V=gammaslope.constr.V,
gammaslope.constr.c=gammaslope.constr.c, gammaslope.fixed=gammaslope.fixed,
notA=TRUE, est.variance=FALSE)
summary(mod1)
#*** Model 2: Mixed Rasch model with two latent classes
# theta grid
theta.k1 <- seq( -4, 4, len=7 )
TT <- length(theta.k1)
#-- define theta design matrix
theta.k <- diag(2*TT) # 2*7=14 classes
#-- delta designmatrix
delta.designmatrix <- matrix( 0, 2*TT, ncol=6 )
# Class 1
delta.designmatrix[1:TT, 1] <- 1
delta.designmatrix[1:TT, 2:3] <- cbind( theta.k1, theta.k1^2 )
# Class 2
delta.designmatrix[TT+1:TT, 4] <- 1
delta.designmatrix[TT+1:TT, 5:6] <- cbind( theta.k1, theta.k1^2 )
#-- define loading matrix E
E <- array( 0, dim=c(I,2,2*TT,2*I + 1) ) # last parameter is constant 1
dimnames(E)[[1]] <- colnames(dat)
dimnames(E)[[2]] <- c("Cat0","Cat1")
dimnames(E)[[3]] <- c( paste0("Class1_theta", 1:TT), paste0("Class2_theta", 1:TT) )
dimnames(E)[[4]] <- c( paste0("b_Class1_", colnames(dat)),
paste0("b_Class2_", colnames(dat)), "One")
for (ii in 1:I){
# Class 1 item parameters
E[ ii, 2, 1:TT, ii ] <- -1 # '-b' in '1*theta - b'
E[ ii, 2, 1:TT, 2*I+1] <- theta.k1 # '1*theta' in '1*theta - b'
# Class 2 item parameters
E[ ii, 2, TT + 1:TT, I + ii ] <- -1
E[ ii, 2, TT + 1:TT, 2*I+1] <- theta.k1
}
# initial gammaslope parameters
par1 <- qlogis( colMeans( dat ) )
gammaslope <- c( par1, par1 + stats::runif(I, -2,2 ), 1 )
# sum constraint of zero on item difficulties within a class
gammaslope.center.index <- c( rep( 1, I ), rep(2,I), 0 )
gammaslope.center.value <- c(0,0)
# estimate model
mod1 <- TAM::tam.mml.3pl(resp=dat, E=E, skillspace="discrete",
theta.k=theta.k, delta.designmatrix=delta.designmatrix,
gammaslope=gammaslope, gammaslope.center.index=gammaslope.center.index,
gammaslope.center.value=gammaslope.center.value, gammaslope.fixed=gammaslope.fixed,
notA=TRUE)
summary(mod1)
# latent class proportions
stats::aggregate( mod1$pi.k, list( rep(1:2, each=TT)), sum )
# compare simulated and estimated item parameters
cbind( b1, b2, - mod1$gammaslope[1:I], - mod1$gammaslope[I + 1:I ] )
#--- specification in tamaan
tammodel <- "
ANALYSIS:
TYPE=MIXTURE;
NCLASSES(2)
NSTARTS(5,30)
LAVAAN MODEL:
F=~ I0001__I0015
ITEM TYPE:
ALL(Rasch);
"
mod1b <- TAM::tamaan( tammodel, resp=dat )
summary(mod1b)
#############################################################################
# EXAMPLE 8: 2PL mixture distribution model
#############################################################################
#*** simulate data
set.seed(9187)
library(sirt)
# simulate two latent classes of Rasch populations
I <- 20
b1 <- seq( -1.5, 1.5, len=I) # difficulties latent class 1
b2 <- b1 # difficulties latent class 2
b2[ c(4,7, 9, 11, 12, 13, 16, 18) ] <- c(1, -.5, -.5, .33, .33, -.66, -1, .3)
# b2 <- scale( b2, scale=FALSE)
b2 <- b2 - mean(b2)
N <- 4000 # number of persons
wgt <- .75 # class probability for class 1
# item slopes
a1 <- rep( 1, I ) # first class
a2 <- rep( c(.5,1.5), I/2 )
# class 1
dat1 <- sirt::sim.raschtype( stats::rnorm( wgt*N ), - b1, fixed.a=a1)
# class 2
dat2 <- sirt::sim.raschtype( stats::rnorm( (1-wgt)*N, mean=1, sd=1.4), - b2, fixed.a=a2)
dat <- rbind( dat1, dat2 )
#*** Model 1: Mixed 2PL model with two latent classes
theta.k1 <- seq( -4, 4, len=7 )
TT <- length(theta.k1)
#-- define theta design matrix
theta.k <- diag(2*TT) # 2*7=14 classes
#-- delta designmatrix
delta.designmatrix <- matrix( 0, 2*TT, ncol=6 )
# Class 1
delta.designmatrix[1:TT, 1] <- 1
delta.designmatrix[1:TT, 2:3] <- cbind( theta.k1, theta.k1^2 )
# Class 2
delta.designmatrix[TT+1:TT, 4] <- 1
delta.designmatrix[TT+1:TT, 5:6] <- cbind( theta.k1, theta.k1^2 )
#-- define loading matrix E
E <- array( 0, dim=c(I,2,2*TT,4*I ) )
dimnames(E)[[1]] <- colnames(dat)
dimnames(E)[[2]] <- c("Cat0","Cat1")
dimnames(E)[[3]] <- c( paste0("Class1_theta", 1:TT), paste0("Class2_theta", 1:TT) )
dimnames(E)[[4]] <- c( paste0("b_Class1_", colnames(dat)),
paste0("a_Class1_", colnames(dat)),
paste0("b_Class2_", colnames(dat)),
paste0("a_Class2_", colnames(dat)) )
for (ii in 1:I){
# Class 1 item parameters
E[ ii, 2, 1:TT, ii ] <- -1 # '-b' in 'a*theta - b'
E[ ii, 2, 1:TT, I + ii] <- theta.k1 # 'a*theta' in 'a*theta - b'
# Class 2 item parameters
E[ ii, 2, TT + 1:TT, 2*I + ii ] <- -1
E[ ii, 2, TT + 1:TT, 3*I + ii ] <- theta.k1
}
# initial gammaslope parameters
par1 <- scale( - stats::qlogis( colMeans( dat ) ), scale=FALSE )
gammaslope <- c( par1, rep(1,I), scale( par1 + runif(I, - 1.4, 1.4 ),
scale=FALSE), stats::runif( I,.6,1.4) )
# constraint matrix
gammaslope.constr.V <- matrix( 0, 4*I, 4 )
# sum of item intercepts equals zero
gammaslope.constr.V[ 1:I, 1] <- 1 # Class 1 (b)
gammaslope.constr.V[ 2*I + 1:I, 2] <- 1 # Class 2 (b)
# sum of item slopes equals number of items -> mean slope of 1
gammaslope.constr.V[ I + 1:I, 3] <- 1 # Class 1 (a)
gammaslope.constr.V[ 3*I + 1:I, 4] <- 1 # Class 2 (a)
gammaslope.constr.c <- c(0,0,I,I)
# estimate model
mod1 <- TAM::tam.mml.3pl(resp=dat, E=E, control=list(maxiter=80), skillspace="discrete",
theta.k=theta.k, delta.designmatrix=delta.designmatrix,
gammaslope=gammaslope, gammaslope.constr.V=gammaslope.constr.V,
gammaslope.constr.c=gammaslope.constr.c, gammaslope.fixed=gammaslope.fixed,
notA=TRUE)
# estimated item parameters
mod1$gammaslope
# summary
summary(mod1)
# latent class proportions
round( stats::aggregate( mod1$pi.k, list( rep(1:2, each=TT)), sum ), 3 )
# compare simulated and estimated item intercepts
int <- cbind( b1*a1, b2 * a2, - mod1$gammaslope[1:I], - mod1$gammaslope[2*I + 1:I ] )
round( int, 3 )
# simulated and estimated item slopes
slo <- cbind( a1, a2, mod1$gammaslope[I+1:I], mod1$gammaslope[3*I + 1:I ] )
round(slo,3)
#--- specification in tamaan
tammodel <- "
ANALYSIS:
TYPE=MIXTURE;
NCLASSES(2)
NSTARTS(10,25)
LAVAAN MODEL:
F=~ I0001__I0020
"
mod1t <- TAM::tamaan( tammodel, resp=dat )
summary(mod1t)
#############################################################################
# EXAMPLE 9: Toy example: Exact representation of an item by a factor
#############################################################################
data(data.gpcm)
dat <- data.gpcm[,1,drop=FALSE ] # choose first item
# some descriptives
( t1 <- table(dat) )
# The idea is that we define an IRT model with one latent variable
# which extactly corresponds to the manifest item.
I <- 1 # 1 item
K <- 4 # 4 categories
TP <- 4 # 4 discrete theta points
# define skill space
theta.k <- diag(TP)
# define loading matrix E
E <- array( -99, dim=c(I,K,TP,1 ) )
for (vv in 1:K){
E[ 1, vv, vv, 1 ] <- 9
}
# estimate model
mod1 <- TAM::tam.mml.3pl(resp=dat, E=E, skillspace="discrete",
theta.k=theta.k, notA=TRUE)
summary(mod1)
# -> the latent distribution corresponds to the manifest distribution, because ...
round( mod1$pi.k, 3 )
round( t1 / sum(t1), 3 )
#############################################################################
# EXAMPLE 10: Some fixed item loadings
#############################################################################
data(data.Students,package="CDM")
dat <- data.Students
# select variables
vars <- scan( nlines=1, what="character")
act1 act2 act3 act4 act5 sc1 sc2 sc3 sc4
dat <- data.Students[, vars ]
# define loading matrix: two-dimensional model
Q <- matrix( 0, nrow=9, ncol=2 )
Q[1:5,1] <- 1
Q[6:9,2] <- 1
# define some fixed item loadings
Q.fixed <- NA*Q
Q.fixed[ c(1,4), 1] <- .5
Q.fixed[ 6:7, 2 ] <- 1
# estimate model
mod3 <- TAM::tam.mml.3pl( resp=dat, gammaslope.des="2PL", Q=Q, Q.fixed=Q.fixed,
control=list( maxiter=10, nodes=seq(-4,4,len=10) ) )
summary(mod3)
#############################################################################
# EXAMPLE 11: Mixed response formats - Multiple choice and partial credit items
#############################################################################
data(data.timssAusTwn.scored)
dat <- data.timssAusTwn.scored
# select columns with item responses
dat <- dat[, grep("M0", colnames(dat) ) ]
I <- ncol(dat) # number of items
# The idea is to start with partial credit modelling
# and then to include the guessing parameters
#*** Model 0: Partial Credit Model
mod0 <- TAM::tam.mml(dat)
summary(mod0)
#*** Model 1 and Model 2: include guessing parameters
# multiple choice items
guess_items <- which( apply( dat, 2, max, na.rm=TRUE )==1 )
# define guessing parameters
guess0 <- rep(0,I)
guess0[ guess_items ] <- .25 # set guessing probability to .25
# define which guessing parameters should be estimated
est.guess1 <- rep(0,I) # all parameters are fixed
est.guess2 <- 1 * ( guess0==.25 ) # joint guessing parameter
# use design matrix from partial credit model
A0 <- mod0$A
#--- Model 1: fixed guessing parameters of .25 and item slopes of 1
mod1 <- TAM::tam.mml.3pl( dat, guess=guess0, est.guess=est.guess1,
A=A0, est.some.slopes=FALSE, control=list(maxiter=50) )
summary(mod1)
#--- Model 2: estimate joint guessing parameters and item slopes of 1
mod2 <- TAM::tam.mml.3pl( dat, guess=guess0, est.guess=est.guess2,
A=A0, est.some.slopes=FALSE, control=list(maxiter=50) )
summary(mod2)
# model comparison
IRT.compareModels(mod0,mod1,mod2)
## End(Not run)