gdm {CDM} | R Documentation |
General Diagnostic Model
Description
This function estimates the general diagnostic model (von Davier, 2008; Xu & von Davier, 2008) which handles multidimensional item response models with ordered discrete or continuous latent variables for polytomous item responses.
Usage
gdm( data, theta.k, irtmodel="2PL", group=NULL, weights=rep(1, nrow(data)),
Qmatrix=NULL, thetaDes=NULL, skillspace="loglinear",
b.constraint=NULL, a.constraint=NULL,
mean.constraint=NULL, Sigma.constraint=NULL, delta.designmatrix=NULL,
standardized.latent=FALSE, centered.latent=FALSE,
centerintercepts=FALSE, centerslopes=FALSE,
maxiter=1000, conv=1e-5, globconv=1e-5, msteps=4, convM=.0005,
decrease.increments=FALSE, use.freqpatt=FALSE, progress=TRUE,
PEM=FALSE, PEM_itermax=maxiter, ...)
## S3 method for class 'gdm'
summary(object, file=NULL, ...)
## S3 method for class 'gdm'
print(x, ...)
## S3 method for class 'gdm'
plot(x, perstype="EAP", group=1, barwidth=.1, histcol=1,
cexcor=3, pchpers=16, cexpers=.7, ... )
Arguments
data |
An |
theta.k |
In the one-dimensional case it must be a vector.
For multidimensional models it has to be a list
of skill vectors if the theta grid differs between
dimensions. If not, a vector input can be supplied.
If an estimated skillspace ( |
irtmodel |
The default |
group |
An optional vector of group identifiers for
multiple group estimation.
For |
weights |
An optional vector of sample weights |
Qmatrix |
An optional array of dimension |
thetaDes |
A design matrix for specifying nonlinear item response functions (see Example 1, Models 4 and 5) |
skillspace |
The parametric assumption of the skillspace.
If |
b.constraint |
In this optional matrix with |
a.constraint |
In this optional matrix with |
mean.constraint |
A |
Sigma.constraint |
A |
delta.designmatrix |
The design matrix of |
standardized.latent |
A logical indicating whether in a uni- or multidimensional
model all latent variables of the first group should be normally distributed
and standardized. The default is |
centered.latent |
A logical indicating whether in a uni- or multidimensional
model all latent variables of the first group should be normally
distributed and do have zero means? The default is |
centerintercepts |
A logical indicating whether intercepts should be centered to have a mean of 0 for all dimensions. This argument does not (yet) work properly for varying numbers of item categories. |
centerslopes |
A logical indicating whether item slopes should be centered to have
a mean of 1 for all dimensions. This argument only works for
|
maxiter |
Maximum number of iterations |
conv |
Convergence criterion for item parameters and distribution parameters |
globconv |
Global deviance convergence criterion |
msteps |
Maximum number of M steps in estimating |
convM |
Convergence criterion in M step |
decrease.increments |
Should in the M step the increments
of |
use.freqpatt |
A logical indicating whether frequencies of unique item response patterns
should be used. In case of large data set |
progress |
An optional logical indicating whether the function should print the progress of iteration in the estimation process. |
PEM |
Logical indicating whether the P-EM acceleration should be applied (Berlinet & Roland, 2012). |
PEM_itermax |
Number of iterations in which the P-EM method should be applied. |
object |
A required object of class |
file |
Optional file name for a file in which |
x |
A required object of class |
perstype |
Person parameter estimate type. Can be either
|
barwidth |
Bar width in |
histcol |
Color of histogram bars in |
cexcor |
Font size for print of correlation in |
pchpers |
Point type for scatter plot of person
parameters in |
cexpers |
Point size for scatter plot of person
parameters in |
... |
Optional parameters to be passed to or from other methods will be ignored. |
Details
Case irtmodel="1PL"
:
Equal item slopes of 1 are assumed in this model. Therefore,
it corresponds to a generalized multidimensional Rasch model.
logit P( X_{nj}=k | \theta_n )=b_{j0} +
\sum_d q_{jdk} \theta_{nd}
The Q-matrix entries q_{jdk}
are pre-specified by the user.
Case irtmodel="2PL"
:
For each item and each dimension, different item slopes a_{jd}
are estimated:
logit P( X_{nj}=k | \theta_n )=b_{j0} +
\sum_d a_{jd} q_{jdk} \theta_{nd}
Case irtmodel="2PLcat"
:
For each item, each dimension and each category,
different item slopes a_{jdk}
are estimated:
logit P( X_{nj}=k | \theta_n )=b_{j0} +
\sum_d a_{jdk} q_{jdk} \theta_{nd}
Note that this model can be generalized to include terms of
any transformation t_h
of the \theta_n
vector (e.g. quadratic terms,
step functions or interaction) such that the model can be formulated as
logit P( X_{nj}=k | \theta_n )=b_{j0} +
\sum_h a_{jhk} q_{jhk} t_h( \theta_{n} )
In general, the number of functions t_1, ..., t_H
will be
larger than the \theta
dimension of D
.
The estimation follows an EM algorithm as described in von Davier and Yamamoto (2004) and von Davier (2008).
In case of skillspace="est"
, the \bold{\theta}
vectors
(the grid of the theta distribution) are estimated (Bartolucci, 2007;
Bacci, Bartolucci & Gnaldi, 2012). This model is called a multidimensional
latent class item response model.
Value
An object of class gdm
. The list contains the
following entries:
item |
Data frame with item parameters |
person |
Data frame with person parameters:
|
EAP.rel |
Reliability of the EAP |
deviance |
Deviance |
ic |
Information criteria, number of estimated parameters |
b |
Item intercepts |
se.b |
Standard error of item intercepts |
a |
Item slopes |
se.a |
Standard error of item slopes |
itemfit.rmsea |
The RMSEA item fit index (see |
mean.rmsea |
Mean of RMSEA item fit indexes. |
Qmatrix |
Used Q-matrix |
pi.k |
Trait distribution |
mean.trait |
Means of trait distribution |
sd.trait |
Standard deviations of trait distribution |
skewness.trait |
Skewnesses of trait distribution |
correlation.trait |
List of correlation matrices of trait distribution corresponding to each group |
pjk |
Item response probabilities evaluated at grid |
n.ik |
An array of expected counts |
G |
Number of groups |
D |
Number of dimension of |
I |
Number of items |
N |
Number of persons |
delta |
Parameter estimates for skillspace representation |
covdelta |
Covariance matrix of parameter estimates for skillspace representation |
data |
Original data frame |
group.stat |
Group statistics (sample sizes, group labels) |
p.xi.aj |
Individual likelihood |
posterior |
Individual posterior distribution |
skill.levels |
Number of skill levels per dimension |
K.item |
Maximal category per item |
theta.k |
Used theta design or estimated theta trait distribution
in case of |
thetaDes |
Used theta design for item responses |
se.theta.k |
Estimated standard errors of |
time |
Info about computation time |
skillspace |
Used skillspace parametrization |
iter |
Number of iterations |
converged |
Logical indicating whether convergence was achieved. |
object |
Object of class |
x |
Object of class |
perstype |
Person paramter estimate type. Can be either
|
group |
Group which should be used for |
barwidth |
Bar width in |
histcol |
Color of histogram bars in |
cexcor |
Font size for print of correlation in |
pchpers |
Point type for scatter plot of person
parameters in |
cexpers |
Point size for scatter plot of person
parameters in |
... |
Optional parameters to be passed to or from other methods will be ignored. |
References
Bacci, S., Bartolucci, F., & Gnaldi, M. (2012). A class of multidimensional latent class IRT models for ordinal polytomous item responses. arXiv preprint, arXiv:1201.4667.
Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72, 141-157.
Berlinet, A. F., & Roland, C. (2012). Acceleration of the EM algorithm: P-EM versus epsilon algorithm. Computational Statistics & Data Analysis, 56(12), 4122-4137.
von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287-307.
von Davier, M., & Yamamoto, K. (2004). Partially observed mixtures of IRT models: An extension of the generalized partial-credit model. Applied Psychological Measurement, 28, 389-406.
Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data. ETS Research Report ETS RR-08-27. Princeton, ETS.
See Also
Cognitive diagnostic models for dichotomous data can be estimated
with din
(DINA or DINO model) or gdina
(GDINA model, which contains many CDMs as special cases).
For assessment of model fit see modelfit.cor.din
and
anova.gdm
.
See itemfit.sx2
for item fit statistics.
For the estimation of the multidimensional
latent class item response model see the MultiLCIRT package
and sirt package (function sirt::rasch.mirtlc
).
Examples
#############################################################################
# EXAMPLE 1: Fraction Dataset 1
# Unidimensional Models for dichotomous data
#############################################################################
data(data.fraction1, package="CDM")
dat <- data.fraction1$data
theta.k <- seq( -6, 6, len=15 ) # discretized ability
#***
# Model 1: Rasch model (normal distribution)
mod1 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="normal",
centered.latent=TRUE)
summary(mod1)
plot(mod1)
#***
# Model 2: Rasch model (log-linear smoothing)
# set the item difficulty of the 8th item to zero
b.constraint <- matrix( c(8,1,0), 1, 3 )
mod2 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k,
skillspace="loglinear", b.constraint=b.constraint )
summary(mod2)
#***
# Model 3: 2PL model
mod3 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k,
skillspace="normal", standardized.latent=TRUE )
summary(mod3)
## Not run:
#***
# Model 4: include quadratic term in item response function
# using the argument decrease.increments=TRUE leads to a more
# stable estimate
thetaDes <- cbind( theta.k, theta.k^2 )
colnames(thetaDes) <- c( "F1", "F1q" )
mod4 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k,
thetaDes=thetaDes, skillspace="normal",
standardized.latent=TRUE, decrease.increments=TRUE)
summary(mod4)
#***
# Model 5: step function for ICC
# two different probabilities theta < 0 and theta > 0
thetaDes <- matrix( 1*(theta.k>0), ncol=1 )
colnames(thetaDes) <- c( "Fgrm1" )
mod5 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k,
thetaDes=thetaDes, skillspace="normal" )
summary(mod5)
#***
# Model 6: DINA model with din function
mod6 <- CDM::din( dat, q.matrix=matrix( 1, nrow=ncol(dat),ncol=1 ) )
summary(mod6)
#***
# Model 7: Estimating a version of the DINA model with gdm
theta.k <- c(-.5,.5)
mod7 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, skillspace="loglinear" )
summary(mod7)
#############################################################################
# EXAMPLE 2: Cultural Activities - data.Students
# Unidimensional Models for polytomous data
#############################################################################
data(data.Students, package="CDM")
dat <- data.Students
dat <- dat[, grep( "act", colnames(dat) ) ]
theta.k <- seq( -4, 4, len=11 ) # discretized ability
#***
# Model 1: Partial Credit Model (PCM)
mod1 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="normal",
centered.latent=TRUE)
summary(mod1)
plot(mod1)
#***
# Model 1b: PCM using frequency patterns
mod1b <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="normal",
centered.latent=TRUE, use.freqpatt=TRUE)
summary(mod1b)
#***
# Model 2: PCM with two groups
mod2 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k,
group=CDM::data.Students$urban + 1, skillspace="normal",
centered.latent=TRUE)
summary(mod2)
#***
# Model 3: PCM with loglinear smoothing
b.constraint <- matrix( c(1,2,0), ncol=3 )
mod3 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k,
skillspace="loglinear", b.constraint=b.constraint )
summary(mod3)
#***
# Model 4: Model with pre-specified item weights in Q-matrix
Qmatrix <- array( 1, dim=c(5,1,2) )
Qmatrix[,1,2] <- 2 # default is score 2 for category 2
# now change the scoring of category 2:
Qmatrix[c(2,4),1,1] <- .74
Qmatrix[c(2,4),1,2] <- 2.3
# for items 2 and 4 the score for category 1 is .74 and for category 2 it is 2.3
mod4 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, Qmatrix=Qmatrix,
skillspace="normal", centered.latent=TRUE)
summary(mod4)
#***
# Model 5: Generalized partial credit model
mod5 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k,
skillspace="normal", standardized.latent=TRUE )
summary(mod5)
#***
# Model 6: Item-category slope estimation
mod6 <- CDM::gdm( dat, irtmodel="2PLcat", theta.k=theta.k, skillspace="normal",
standardized.latent=TRUE, decrease.increments=TRUE)
summary(mod6)
#***
# Models 7: items with different number of categories
dat0 <- dat
dat0[ paste(dat0[,1])==2, 1 ] <- 1 # 1st item has only two categories
dat0[ paste(dat0[,3])==2, 3 ] <- 1 # 3rd item has only two categories
# Model 7a: PCM
mod7a <- CDM::gdm( dat0, irtmodel="1PL", theta.k=theta.k, centered.latent=TRUE )
summary(mod7a)
# Model 7b: Item category slopes
mod7b <- CDM::gdm( dat0, irtmodel="2PLcat", theta.k=theta.k,
standardized.latent=TRUE, decrease.increments=TRUE )
summary(mod7b)
#############################################################################
# EXAMPLE 3: Fraction Dataset 2
# Multidimensional Models for dichotomous data
#############################################################################
data(data.fraction2, package="CDM")
dat <- data.fraction2$data
Qmatrix <- data.fraction2$q.matrix3
#***
# Model 1: One-dimensional Rasch model
theta.k <- seq( -4, 4, len=11 ) # discretized ability
mod1 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, centered.latent=TRUE)
summary(mod1)
plot(mod1)
#***
# Model 2: One-dimensional 2PL model
mod2 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, standardized.latent=TRUE)
summary(mod2)
plot(mod2)
#***
# Model 3: 3-dimensional Rasch Model (normal distribution)
mod3 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, Qmatrix=Qmatrix,
centered.latent=TRUE, globconv=5*1E-3, conv=1E-4 )
summary(mod3)
#***
# Model 4: 3-dimensional Rasch model (loglinear smoothing)
# set some item parameters of items 4,1 and 2 to zero
b.constraint <- cbind( c(4,1,2), 1, 0 )
mod4 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, Qmatrix=Qmatrix,
b.constraint=b.constraint, skillspace="loglinear" )
summary(mod4)
#***
# Model 5: define a different theta grid for each dimension
theta.k <- list( "Dim1"=seq( -5, 5, len=11 ),
"Dim2"=seq(-5,5,len=8),
"Dim3"=seq( -3,3,len=6) )
mod5 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, Qmatrix=Qmatrix,
b.constraint=b.constraint, skillspace="loglinear")
summary(mod5)
#***
# Model 6: multdimensional 2PL model (normal distribution)
theta.k <- seq( -5, 5, len=13 )
a.constraint <- cbind( c(8,1,3), 1:3, 1, 1 ) # fix some slopes to 1
mod6 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, Qmatrix=Qmatrix,
centered.latent=TRUE, a.constraint=a.constraint, decrease.increments=TRUE,
skillspace="normal")
summary(mod6)
#***
# Model 7: multdimensional 2PL model (loglinear distribution)
a.constraint <- cbind( c(8,1,3), 1:3, 1, 1 )
b.constraint <- cbind( c(8,1,3), 1, 0 )
mod7 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, Qmatrix=Qmatrix,
b.constraint=b.constraint, a.constraint=a.constraint,
decrease.increments=FALSE, skillspace="loglinear")
summary(mod7)
#############################################################################
# EXAMPLE 4: Unidimensional latent class 1PL IRT model
#############################################################################
# simulate data
set.seed(754)
I <- 20 # number of items
N <- 2000 # number of persons
theta <- c( -2, 0, 1, 2 )
theta <- rep( theta, c(N/4,N/4, 3*N/8, N/8) )
b <- seq(-2,2,len=I)
library(sirt) # use function sim.raschtype from sirt package
dat <- sirt::sim.raschtype( theta=theta, b=b )
theta.k <- seq(-1, 1, len=4) # initial vector of theta
# estimate model
mod1 <- CDM::gdm( dat, theta.k=theta.k, skillspace="est", irtmodel="1PL",
centerintercepts=TRUE, maxiter=200)
summary(mod1)
## Estimated Skill Distribution
## F1 pi.k
## 1 -1.988 0.24813
## 2 -0.055 0.23313
## 3 0.940 0.40059
## 4 2.000 0.11816
#############################################################################
# EXAMPLE 5: Multidimensional latent class IRT model
#############################################################################
# We simulate a two-dimensional IRT model in which theta vectors
# are observed at a fixed discrete grid (see below).
# simulate data
set.seed(754)
I <- 13 # number of items
N <- 2400 # number of persons
# simulate Dimension 1 at 4 discrete theta points
theta <- c( -2, 0, 1, 2 )
theta <- rep( theta, c(N/4,N/4, 3*N/8, N/8) )
b <- seq(-2,2,len=I)
library(sirt) # use simulation function from sirt package
dat1 <- sirt::sim.raschtype( theta=theta, b=b )
# simulate Dimension 2 at 4 discrete theta points
theta <- c( -3, 0, 1.5, 2 )
theta <- rep( theta, c(N/4,N/4, 3*N/8, N/8) )
dat2 <- sirt::sim.raschtype( theta=theta, b=b )
colnames(dat2) <- gsub( "I", "U", colnames(dat2))
dat <- cbind( dat1, dat2 )
# define Q-matrix
Qmatrix <- matrix(0,2*I,2)
Qmatrix[ cbind( 1:(2*I), rep(1:2, each=I) ) ] <- 1
theta.k <- seq(-1, 1, len=4) # initial matrix
theta.k <- cbind( theta.k, theta.k )
colnames(theta.k) <- c("Dim1","Dim2")
# estimate model
mod2 <- CDM::gdm( dat, theta.k=theta.k, skillspace="est", irtmodel="1PL",
Qmatrix=Qmatrix, centerintercepts=TRUE)
summary(mod2)
## Estimated Skill Distribution
## theta.k.Dim1 theta.k.Dim2 pi.k
## 1 -2.022 -3.035 0.25010
## 2 0.016 0.053 0.24794
## 3 0.956 1.525 0.36401
## 4 1.958 1.919 0.13795
#############################################################################
# EXAMPLE 6: Large-scale dataset data.mg
#############################################################################
data(data.mg, package="CDM")
dat <- data.mg[, paste0("I", 1:11 ) ]
theta.k <- seq(-6,6,len=21)
#***
# Model 1: Generalized partial credit model with multiple groups
mod1 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, group=CDM::data.mg$group,
skillspace="normal", standardized.latent=TRUE)
summary(mod1)
## End(Not run)