R: Structured Latent Class Analysis (SLCA)

slca {CDM}

R Documentation

Structured Latent Class Analysis (SLCA)

Description

This function implements a structured latent class model for polytomous item responses (Formann, 1985, 1992). Lasso estimation for the item parameters is included (Chen, Liu, Xu & Ying, 2015; Chen, Li, Liu & Ying, 2017; Sun, Chen, Liu, Ying & Xin, 2016).

Usage

slca(data, group=NULL, weights=rep(1, nrow(data)), Xdes,
  Xlambda.init=NULL, Xlambda.fixed=NULL, Xlambda.constr.V=NULL,
  Xlambda.constr.c=NULL,  delta.designmatrix=NULL,
  delta.init=NULL, delta.fixed=NULL, delta.linkfct="log",
  Xlambda_positive=NULL, regular_type="lasso", regular_lam=0, regular_w=NULL,
  regular_n=nrow(data), maxiter=1000, conv=1e-5, globconv=1e-5, msteps=10,
  convM=5e-04, decrease.increments=FALSE, oldfac=0, dampening_factor=1.01,
  seed=NULL, progress=TRUE, PEM=TRUE, PEM_itermax=maxiter, ...)

## S3 method for class 'slca'
summary(object, file=NULL, ...)

## S3 method for class 'slca'
print(x, ...)

## S3 method for class 'slca'
plot(x, group=1, ... )

Arguments

`data`	Matrix of polytomous item responses
`group`	Optional vector of group identifiers. For `plot.slca` it is a single integer group identified.
`weights`	Optional vector of sample weights
`Xdes`	Design matrix for `x_{ijh}` with `q_{ihjv}` entries. Therefore, it must be an array with four dimensions referring to items (`i`), categories (`h`), latent classes (`j`) and `\lambda` parameters (`v`).
`Xlambda.init`	Initial `\lambda_x` parameters
`Xlambda.fixed`	Fixed `\lambda_x` parameters. These must be provided by a matrix with two columns: 1st column – Parameter index, 2nd column: Fixed value.
`Xlambda.constr.V`	A design matrix for linear restrictions of the form `V_x \lambda_x=c_x` for the `\lambda_x` parameter.
`Xlambda.constr.c`	A vector for the linear restriction `V_x \lambda_x=c_x` of the `\lambda_x` parameter.
`delta.designmatrix`	Design matrix for delta parameters `\delta` parameterizing the latent class distribution by log-linear smoothing (Xu & von Davier, 2008)
`delta.init`	Initial `\delta` parameters
`delta.fixed`	Fixed `\delta` parameters. This must be a matrix with three columns: 1st column: Parameter index, 2nd column: Group index, 3rd column: Fixed value
`delta.linkfct`	Link function for skill space reduction. This can be the log-linear link (`log`) or the logistic link function (`logit`).
`Xlambda_positive`	Optional vector of logical indicating which elements of `\bold{\lambda}_x` should be constrained to be positive.
`regular_type`	Regularization method which can be `lasso`, `scad` or `mcp`. See `gdina` for more information and references.
`regular_lam`	Numeric. Regularization parameter
`regular_w`	Vector for weighting the regularization penalty
`regular_n`	Vector of regularization factor. This will be typically the sample size.
`maxiter`	Maximum number of iterations
`conv`	Convergence criterion for item parameters and distribution parameters
`globconv`	Global deviance convergence criterion
`msteps`	Maximum number of M steps in estimating `b` and `a` item parameters. The default is to use 4 M steps.
`convM`	Convergence criterion in M step
`decrease.increments`	Should in the M step the increments of `a` and `b` parameters decrease during iterations? The default is `FALSE`. If there is an increase in deviance during estimation, setting `decrease.increments` to `TRUE` is recommended.
`oldfac`	Factor `f` between 0 and 1 to control convergence behavior. If `x_t` denotes the estimated parameter in iteration `t`, then the regularized estimate `x_t^{\ast}` is obtained by `x_t^{\ast}=f x_{t-1} + (1-f) x_t`. Therefore, values of `oldfac` near to one only allow for small changes in estimated parameters from in succeeding iterations.
`dampening_factor`	Factor larger than one defining the specified decrease in decrements in iterations.
`seed`	Simulation seed for initial parameters. The default of `NULL` corresponds to a random seed.
`progress`	An optional logical indicating whether the function should print the progress of iteration in the estimation process.
`PEM`	Logical indicating whether the P-EM acceleration should be applied (Berlinet & Roland, 2012).
`PEM_itermax`	Number of iterations in which the P-EM method should be applied.
`object`	A required object of class `slca`
`file`	Optional file name for a file in which `summary` should be sinked.
`x`	A required object of class `slca`
`...`	Optional parameters to be passed to or from other methods will be ignored.

Details

The structured latent class model allows for general constraints of items i in categories h and classes j. The item response model is

P( X_{i}=h | j )=\frac{ \exp( x_{ihj} ) }{ \sum_l \exp( x_{ilj} ) }

with linear constraints on the class specific probabilities

x_{ihj}=\sum_v q_{ihjv} \lambda_{xv}

Linear restrictions on the \lambda_x parameter can be specified by a matrix equation V_x \lambda_x=c_x (see Xlambda.constr.V and Xlambda.constr.c; Neuhaus, 1996).

The latent class distribution can be smoothed by a log-linear link function (Xu & von Davier, 2008) or a logistic link function (Formann, 1992). For class j in group g employing a link function h, it holds that

h [ P( j| g) ] \propto \sum_w r_{jw} \delta_{gw}

where group-specific distributions are allowed. The values r_{jw} are specified in the design matrix delta.designmatrix.

This model contains classical uni- and multidimensional latent trait models, latent class analysis, located latent class analysis, cognitive diagnostic models, the general diagnostic model and mixture item response models as special cases (see Formann & Kohlmann, 1998; Formann, 2007).

The function also allows for regularization of \lambda_{xv} parameters using the lasso approach (Sun et al., 2016). More formally, the penalty function can be written as

pen( \bold{\lambda}_x )=p_\lambda \sum_v n_v w_v | \lambda_{xv} |

where p_\lambda can be specified with regular_lam, w_v can be specified with regular_w, and n_v can be specified with regular_n.

Value

An object of class slca. The list contains the following entries:

`item`	Data frame with conditional item probabilities
`deviance`	Deviance
`ic`	Information criteria, number of estimated parameters
`Xlambda`	Estimated `\lambda_x` parameters
`se.Xlambda`	Standard error of `\lambda_x` parameters
`pi.k`	Trait distribution
`pjk`	Item response probabilities evaluated for all classes
`n.ik`	An array of expected counts `n_{cikg}` of ability class `c` at item `i` at category `k` in group `g`
`G`	Number of groups
`I`	Number of items
`N`	Number of persons
`delta`	Parameter estimates for skillspace representation
`covdelta`	Covariance matrix of parameter estimates for skillspace representation
`MLE.class`	Classified skills for each student (MLE)
`MAP.class`	Classified skills for each student (MAP)
`data`	Original data frame
`group.stat`	Group statistics (sample sizes, group labels)
`p.xi.aj`	Individual likelihood
`posterior`	Individual posterior distribution
`K.item`	Maximal category per item
`time`	Info about computation time
`skillspace`	Used skillspace parametrization
`iter`	Number of iterations
`seed.used`	Used simulation seed
`Xlambda.init`	Used initial lambda parameters
`delta.init`	Used initial delta parameters
`converged`	Logical indicating whether convergence was achieved.

Note

If some items have differing number of categories, appropriate class probabilities in non-existing categories per items can be practically set to zero by loading an item for all skill classes on a fixed \lambda_x parameter of a small number, e.g. -999.

The implementation of the model builds on pieces work of Anton Formann. See http://www.antonformann.at/ for more information.

References

Berlinet, A. F., & Roland, C. (2012). Acceleration of the EM algorithm: P-EM versus epsilon algorithm. Computational Statistics & Data Analysis, 56(12), 4122-4137.

Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110, 850-866.

Chen, Y., Li, X., Liu, J., & Ying, Z. (2017). Regularized latent class analysis with application in cognitive diagnosis. Psychometrika, 82, 660-692.

Formann, A. K. (1985). Constrained latent class models: Theory and applications. British Journal of Mathematical and Statistical Psychology, 38, 87-111.

Formann, A. K. (1992). Linear logistic latent class analysis for polytomous data. Journal of the American Statistical Association, 87, 476-486.

Formann, A. K. (2007). (Almost) Equivalence between conditional and mixture maximum likelihood estimates for some models of the Rasch type. In M. von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models (pp. 177-189). New York: Springer.

Formann, A. K., & Kohlmann, T. (1998). Structural latent class models. Sociological Methods & Research, 26, 530-565.

Neuhaus, W. (1996). Optimal estimation under linear constraints. Astin Bulletin, 26, 233-245.

Sun, J., Chen, Y., Liu, J., Ying, Z., & Xin, T. (2016). Latent variable selection for multidimensional item response theory models via L_1 regularization. Psychometrika, 81(4), 921-939.

Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data. ETS Research Report ETS RR-08-27. Princeton, ETS.

Examples

#############################################################################
# EXAMPLE 1: data.Students | (Generalized) Partial Credit Model
#############################################################################

data(data.Students, package="CDM")

dat <- data.Students[, c("mj1","mj2","mj3","mj4","sc1", "sc2") ]
# define discretized ability
theta.k <- seq( -6, 6, len=21 )

#*** Model 1: Partial credit model

# define design matrix for lambda
I <- ncol(dat)
maxK <- 4
TP <- length(theta.k)
NXlam <- I*(maxK-1) + 1       # number of estimated parameters
       # last parameter is joint slope parameter
Xdes <- array( 0, dim=c(I, maxK, TP,  NXlam ) )
# Item1Cat1, ..., Item1Cat3, Item2Cat1, ...,
dimnames(Xdes)[[1]] <- colnames(dat)
dimnames(Xdes)[[2]] <- paste0("Cat", 1:(maxK) )
dimnames(Xdes)[[3]] <- paste0("Class", 1:TP )
v2 <- unlist( sapply( 1:I, FUN=function(ii){ # ii
    paste0( paste0( colnames(dat)[ii], "_b"  ), "Cat", 1:(maxK-1) )
                }, simplify=FALSE) )
dimnames(Xdes)[[4]] <- c( v2, "a" )
# define theta design and item discriminations
for (ii in 1:I){
    for (hh in 1:(maxK-1) ){
        Xdes[ii, hh + 1,, NXlam ] <- hh * theta.k
    }
}
# item intercepts
for (ii in 1:I){
    for (hh in 1:(maxK-1) ){
        # ii <- 1  # Item    # hh <- 1  # category
        Xdes[ii,hh+1,, ( ii - 1)*(maxK-1) + hh] <- 1
    }
}
#****
# skill space designmatrix
TP <- length(theta.k)
w1 <- stats::dnorm(theta.k)
w1 <- w1 / sum(w1)
delta.designmatrix <- matrix( 1, nrow=TP, ncol=1 )
delta.designmatrix[,1] <- log(w1)

# initial lambda parameters
Xlambda.init <- c( stats::rnorm( dim(Xdes)[[4]] - 1 ), 1 )
# fixed delta parameter
delta.fixed <- cbind( 1, 1,1 )

# estimate model
mod1 <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix,
            Xlambda.init=Xlambda.init, delta.fixed=delta.fixed )
summary(mod1)
plot(mod1, cex.names=.7 )

## Not run: 
#*** Model 2: Partial credit model with some parameter constraints
# fixed lambda parameters
Xlambda.fixed <- cbind( c(1,19), c(3.2,1.52 ) )
# 1st parameter=3.2
# 19th parameter=1.52 (joint item slope)
mod2 <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix,
            delta.init=delta.init, Xlambda.init=Xlambda.init, delta.fixed=delta.fixed,
            Xlambda.fixed=Xlambda.fixed, maxiter=70 )

#*** Model 3: Partial credit model with non-normal distribution
Xlambda.fixed <- cbind(  c(1,19), c(3.2,1) ) # fix item slope to one
delta.designmatrix <- cbind( 1, theta.k, theta.k^2, theta.k^3 )
mod3 <- CDM::slca( dat, Xdes=Xdes,  delta.designmatrix=delta.designmatrix,
            Xlambda.fixed=Xlambda.fixed,  maxiter=200 )
summary(mod3)

# non-normal distribution with convergence regularizing factor oldfac
mod3a <- CDM::slca( dat, Xdes=Xdes,  delta.designmatrix=delta.designmatrix,
            Xlambda.fixed=Xlambda.fixed, maxiter=500, oldfac=.95 )
summary(mod3a)

#*** Model 4: Generalized Partial Credit Model

# estimate generalized partial credit model without restrictions on trait
# distribution and item parameters to ensure better convergence behavior
# Note that two parameters are not identifiable and information criteria
# have to be adapted.

#---
# define design matrix for lambda
I <- ncol(dat)
maxK <- 4
TP <- length(theta.k)
NXlam <- I*(maxK-1) + I       # number of estimated parameters
Xdes <- array( 0, dim=c(I, maxK, TP,  NXlam ) )
# Item1Cat1, ..., Item1Cat3, Item2Cat1, ...,
dimnames(Xdes)[[1]] <- colnames(dat)
dimnames(Xdes)[[2]] <- paste0("Cat", 1:(maxK) )
dimnames(Xdes)[[3]] <- paste0("Class", 1:TP )
v2 <- unlist( sapply( 1:I, FUN=function(ii){ # ii
    paste0( paste0( colnames(dat)[ii], "_b"  ), "Cat", 1:(maxK-1) )
                }, simplify=FALSE) )
dimnames(Xdes)[[4]] <- c( v2, paste0( colnames(dat),"_a") )
dimnames(Xdes)
# define theta design and item discriminations
for (ii in 1:I){
    for (hh in 1:(maxK-1) ){
        Xdes[ii, hh + 1,, I*(maxK-1) + ii ] <- hh * theta.k
    }
}
# item intercepts
for (ii in 1:I){
    for (hh in 1:(maxK-1) ){
        Xdes[ii,hh+1,, ( ii - 1)*(maxK-1) + hh] <- 1
    }
}
#****
# skill space designmatrix
delta.designmatrix <- cbind( 1, theta.k,theta.k^2 )
# initial lambda parameters from partial credit model
Xlambda.init <- mod1$Xlambda
Xlambda.init <- c( mod1$Xlambda[ - length(Xlambda.init) ],
         rep( Xlambda.init[ length(Xlambda.init)  ],I) )

# estimate model
mod4 <- CDM::slca( dat, Xdes=Xdes, Xlambda.init=Xlambda.init,
             delta.designmatrix=delta.designmatrix, decrease.increments=TRUE,
             maxiter=300 )

#############################################################################
# EXAMPLE 2: Latent class model with two classes
#############################################################################

set.seed(9876)
I <- 7    # number of items
# simulate response probabilities
a1 <- stats::runif(I, 0, .4 )
a2 <- stats::runif(I, .6, 1 )
N <- 1000    # sample size
# simulate data in two classes of proportions .3 and .7
N1 <- round(.3*N)
dat1 <- 1 * ( matrix(a1,N1,I,byrow=TRUE) > matrix( stats::runif( N1 * I), N1, I ) )
N2 <- round(.7*N)
dat2 <- 1 * ( matrix(a2,N2,I,byrow=TRUE) > matrix( stats::runif( N2 * I), N2, I ) )
dat <- rbind( dat1, dat2 )
colnames(dat) <- paste0("I", 1:I)

# define design matrices
TP <- 2     # two classes
# The idea is that latent classes refer to two different "dimensions".
# Items load on latent class indicators 1 and 2, see below.
Xdes <- array(0, dim=c(I,2,2,2*I) )
items <- colnames(dat)
dimnames(Xdes)[[4]] <- c(paste0( colnames(dat), "Class", 1),
          paste0( colnames(dat), "Class", 2) )
    # items, categories, classes, parameters
# probabilities for correct solution
for (ii in 1:I){
    Xdes[ ii, 2, 1, ii ] <- 1    # probabilities class 1
    Xdes[ ii, 2, 2, ii+I ] <- 1  # probabilities class 2
}
# estimate model
mod1 <- CDM::slca( dat, Xdes=Xdes )
summary(mod1)

#############################################################################
# EXAMPLE 3: Mixed Rasch model with two classes
#############################################################################

set.seed(987)
library(sirt)
# simulate two latent classes of Rasch populations
I <- 15  # 6 items
b1 <- seq( -1.5, 1.5, len=I)      # difficulties latent class 1
b2 <- b1    # difficulties latent class 2
b2[ c(4,7, 9, 11, 12, 13) ] <- c(1, -.5, -.5, .33, .33, -.66 )
N <- 3000    # number of persons
wgt <- .25       # class probability for class 1
# class 1
dat1 <- sirt::sim.raschtype( stats::rnorm( wgt*N ), b1 )
# class 2
dat2 <- sirt::sim.raschtype( stats::rnorm( (1-wgt)*N, mean=1, sd=1.7), b2 )
dat <- rbind( dat1, dat2 )
# theta grid
theta.k <- seq( -5, 5, len=9 )
TP <- length(theta.k)

#*** Model 1: Rasch model with normal distribution
maxK <- 2
NXlam <- I +1
Xdes <- array( 0, dim=c(I, maxK, TP,  NXlam ) )
dimnames(Xdes)[[1]] <- colnames(dat)
dimnames(Xdes)[[2]] <- paste0("Cat", 1:(maxK) )
dimnames(Xdes)[[4]] <- c( paste0( "b_", colnames(dat)[1:I] ), "a" )
# define item difficulties
for (ii in 1:I){
    Xdes[ii, 2,, ii ] <- -1
}
# theta design
for (tt in 1:TP){
    Xdes[1:I, 2, tt, I + 1] <- theta.k[tt]
}

# skill space definition
delta.designmatrix <- cbind( 1, theta.k^2 )
delta.fixed <- NULL
Xlambda.init <- c( stats::runif( I, -.8, .8 ), 1 )
Xlambda.fixed <- cbind( I+1, 1 )
# estimate model
mod1 <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix,
            delta.fixed=delta.fixed, Xlambda.fixed=Xlambda.fixed,
            Xlambda.init=Xlambda.init, decrease.increments=TRUE, maxiter=200 )
summary(mod1)

#*** Model 1b: Constraint the sum of item difficulties to zero

# change skill space definition
delta.designmatrix <- cbind( 1, theta.k, theta.k^2 )
delta.fixed <- NULL
# constrain sum of difficulties Xlambda parameters to zero
Xlambda.constr.V <- matrix( 1, nrow=I+1, ncol=1 )
Xlambda.constr.V[I+1,1] <- 0
Xlambda.constr.c <- c(0)
# estimate model
mod1b <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix,
            Xlambda.fixed=Xlambda.fixed, Xlambda.constr.V=Xlambda.constr.V,
            Xlambda.constr.c=Xlambda.constr.c  )
summary(mod1b)

#*** Model 2: Mixed Rasch model with two latent classes
NXlam <- 2*I +2
Xdes <- array( 0, dim=c(I, maxK, 2*TP,  NXlam ) )
dimnames(Xdes)[[1]] <- colnames(dat)
dimnames(Xdes)[[2]] <- paste0("Cat", 1:(maxK) )
dimnames(Xdes)[[4]] <- c( paste0( "bClass1_", colnames(dat)[1:I] ),
        paste0( "bClass2_", colnames(dat)[1:I] ), "aClass1", "aClass2" )
# define item difficulties
for (ii in 1:I){
    Xdes[ii, 2, 1:TP, ii ] <- -1  # first class
    Xdes[ii, 2, TP + 1:TP, I+ii ] <- -1  # second class
}
# theta design
for (tt in 1:TP){
    Xdes[1:I, 2, tt, 2*I+1 ] <- theta.k[tt]
    Xdes[1:I, 2, TP+tt, 2*I+2 ] <- theta.k[tt]
}
# skill space definition
delta.designmatrix <- matrix( 0, nrow=2*TP, ncol=4 )
delta.designmatrix[1:TP,1] <- 1
delta.designmatrix[1:TP,2] <- theta.k^2
delta.designmatrix[TP + 1:TP,3] <- 1
delta.designmatrix[TP+ 1:TP,4] <- theta.k^2
b1 <- stats::qnorm( colMeans(dat) )
Xlambda.init <- c( stats::runif( 2*I, -1.8, 1.8 ), 1,1 )
Xlambda.fixed <- cbind( c(2*I+1, 2*I+2), 1 )
# estimate model
mod2 <- CDM::slca( dat, Xdes=Xdes,  delta.designmatrix=delta.designmatrix,
            Xlambda.fixed=Xlambda.fixed, decrease.increments=TRUE,
            Xlambda.init=Xlambda.init, maxiter=1000 )
summary(mod2)
summary(mod1)
# latent class proportions
stats::aggregate( mod2$pi.k, list( rep(1:2, each=TP)), sum )

#*** Model 2b: Different parametrization with sum constraint on item difficulties

# skill space definition
delta.designmatrix <- matrix( 0, nrow=2*TP, ncol=6 )
delta.designmatrix[1:TP,1] <- 1
delta.designmatrix[1:TP,2] <- theta.k
delta.designmatrix[1:TP,3] <- theta.k^2
delta.designmatrix[TP+ 1:TP,4] <- 1
delta.designmatrix[TP+ 1:TP,5] <- theta.k
delta.designmatrix[TP+ 1:TP,6] <- theta.k^2
Xlambda.fixed <- cbind( c(2*I+1,2*I+2), c(1,1) )
b1 <- stats::qnorm( colMeans( dat ) )
Xlambda.init <- c( b1, b1 + stats::runif(I, -1, 1 ), 1, 1 )
# constraints on item difficulties
Xlambda.constr.V <- matrix( 0, nrow=NXlam, ncol=2)
Xlambda.constr.V[1:I, 1 ] <- 1
Xlambda.constr.V[I + 1:I, 2 ] <- 1
Xlambda.constr.c <- c(0,0)
# estimate model
mod2b <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix,
            Xlambda.fixed=Xlambda.fixed,  Xlambda.init=Xlambda.init,
            Xlambda.constr.V=Xlambda.constr.V, Xlambda.constr.c=Xlambda.constr.c,
            decrease.increments=TRUE, maxiter=1000 )
summary(mod2b)
stats::aggregate( mod2b$pi.k, list( rep(1:2, each=TP)), sum )

#*** Model 2c: Estimation with mRm package
library(mRm)
mod2c <- mRm::mrm(data.matrix=dat, cl=2)
plot(mod2c)
print(mod2c)

#*** Model 2d: Estimation with psychomix package
library(psychomix)
mod2d <- psychomix::raschmix(data=dat, k=2, verbose=TRUE )
summary(mod2d)
plot(mod2d)

#############################################################################
# EXAMPLE 4: Located latent class model, Rasch model
#############################################################################

set.seed(487)
library(sirt)
I <- 15  # I items
b1 <- seq( -2, 2, len=I)      # item difficulties
N <- 4000    # number of persons
# simulate 4 theta classes
theta0 <- c( -2.5, -1, 0.3, 1.3 )  # skill classes
probs0 <- c( .1, .4, .2, .3 )
TP <- length(theta0)
theta <- theta0[ rep(1:TP, round(probs0*N)  ) ]
dat <- sirt::sim.raschtype( theta, b1 )

#*** Model 1: Located latent class model with 4 classes
maxK <- 2
NXlam <- I + TP
Xdes <- array( 0, dim=c(I, maxK, TP,  NXlam ) )
dimnames(Xdes)[[1]] <- colnames(dat)
dimnames(Xdes)[[2]] <- paste0("Cat", 1:(maxK) )
dimnames(Xdes)[[3]] <- paste0("Class", 1:TP )
dimnames(Xdes)[[4]] <- c( paste0( "b_", colnames(dat)[1:I] ), paste0("theta", 1:TP) )
# define item difficulties
for (ii in 1:I){
    Xdes[ii, 2,, ii ] <- -1
}
# theta design
for (tt in 1:TP){
    Xdes[1:I, 2, tt, I + tt] <- 1
}

# skill space definition
delta.designmatrix <- diag(TP)
Xlambda.init <- c( - stats::qnorm( colMeans(dat) ), seq(-2,1,len=TP)  )
# constraint on item difficulties
Xlambda.constr.V <- matrix( 0, nrow=NXlam, ncol=1)
Xlambda.constr.V[1:I,1] <- 1
Xlambda.constr.c <- c(0)
delta.init <- matrix( c(1,1,1,1), TP, 1 )
# estimate model
mod1 <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix,
            delta.init=delta.init, Xlambda.init=Xlambda.init,
            Xlambda.constr.V=Xlambda.constr.V, Xlambda.constr.c=Xlambda.constr.c,
            decrease.increments=TRUE,  maxiter=400 )
summary(mod1)
# compare estimated and simulated theta class locations
cbind( mod1$Xlambda[ - c(1:I) ], theta0 )
# compare estimated and simulated latent class proportions
cbind( mod1$pi.k, probs0 )

#############################################################################
# EXAMPLE 5: DINA model with two skills
#############################################################################

set.seed(487)
N <- 3000   # number of persons
# define Q-matrix
I <- 9  # 9 items
NS <- 2 # 2 skills
TP <- 4 # number of skill classes
Q <- scan( nlines=3)
  1 0   1 0   1 0
  0 1   0 1   0 1
  1 1   1 1   1 1
Q <- matrix(Q, I, ncol=NS,byrow=TRUE)
# define skill distribution
alpha0 <- matrix( c(0,0,1,0,0,1,1,1), nrow=4,ncol=2,byrow=TRUE)
prob0 <- c( .2, .4, .1, .3 )
alpha <- alpha0[ rep( 1:TP, prob0*N),]
# define guessing and slipping parameters
guess <- round( stats::runif(I, 0, .4 ), 2 )
slip <- round( stats::runif(I, 0, .3 ), 2 )
# simulate data according to the DINA model
dat <- CDM::sim.din( q.matrix=Q, alpha=alpha, slip=slip, guess=guess )$dat

# define Xlambda design matrix
maxK <- 2
NXlam <- 2*I
Xdes <- array( 0, dim=c(I, maxK, TP,  NXlam ) )
dimnames(Xdes)[[1]] <- colnames(dat)
dimnames(Xdes)[[2]] <- paste0("Cat", 1:(maxK) )
dimnames(Xdes)[[3]] <- c("S00","S10","S01","S11")
dimnames(Xdes)[[4]] <- c( paste0("guess",1:I ), paste0( "antislip", 1:I ) )
dimnames(Xdes)
# define item difficulties
for (ii in 1:I){
        # define latent responses
        latresp <- 1*( alpha0 %*% Q[ii,]==sum(Q[ii,]) )[,1]
        # model slipping parameters
        Xdes[ii, 2, latresp==1, I+ii ] <- 1
        # guessing parameters
        Xdes[ii, 2, latresp==0, ii ] <- 1
}
Xdes[1,2,,]
Xdes[7,2,,]
# skill space definition
delta.designmatrix <- diag(TP)
Xlambda.init <- c( rep( stats::qlogis( .2 ), I ), rep( stats::qlogis( .8 ), I ) )

# estimate DINA model with slca function
mod1 <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix,
            Xlambda.init=Xlambda.init, decrease.increments=TRUE, maxiter=400 )
summary(mod1)

# compare estimated and simulated latent class proportions
cbind( mod1$pi.k, probs0 )
# compare estimated and simulated guessing parameters
cbind( mod1$pjk[1,,2], guess )
# compare estimated and simulated slipping parameters
cbind( 1 - mod1$pjk[4,,2], slip )

#############################################################################
# EXAMPLE 6: Investigating differential item functioning in Rasch models
#            with regularization
#############################################################################

#---- simulate data
set.seed(987)
N <- 1000   # number of persons in a group
I <- 20    # number of items
#* population parameters of two groups
mu1 <- 0
mu2 <- .6
sd1 <- 1.4
sd2 <- 1
# item difficulties
b <- seq( -1.1, 1.1, len=I )
# define some DIF effects
dif <- rep(0,I)
dif[ c(3,6,9,12)] <- c( .6, -1, .75, -.35 )
print(dif)
#* simulate datasets
dat1 <- sirt::sim.raschtype( rnorm(N, mean=mu1, sd=sd1), b=b - dif /2 )
colnames(dat1) <- paste0("I", 1:I, "_G1")
dat2 <- sirt::sim.raschtype( rnorm(N, mean=mu2, sd=sd2), b=b + dif /2 )
colnames(dat2) <- paste0("I", 1:I, "_G2")
dat <- CDM::CDM_rbind_fill( dat1, dat2 )
dat <- data.frame( "group"=rep(1:2, each=N), dat )

#-- nodes for distribution
theta.k <- seq(-4, 4, len=11)
# define design matrix for lambda
nitems <- ncol(dat) - 1
maxK <- 2
TP <- length(theta.k)
NXlam <- 2*I + 1
Xdes <- array( 0, dim=c( nitems, maxK, TP,  NXlam  ) )
dimnames(Xdes)[[1]] <- colnames(dat)[-1]
dimnames(Xdes)[[2]] <- paste0("Cat", 0:(maxK-1) )
dimnames(Xdes)[[3]] <- paste0("Theta", 1:TP )
dimnames(Xdes)[[4]] <- c( paste0("b", 1:I ), paste0("dif", 1:I ), "const" )
# define theta design
for (ii in 1:nitems){
    Xdes[ii,2,,NXlam ] <- theta.k
}
# item intercepts and DIF effects
for (ii in 1:I){
    Xdes[c(ii,ii+I),2,, ii ] <- -1
    Xdes[ii,2,,ii+I] <- - 1/2
    Xdes[ii+I,2,,ii+I] <- 1/2
}
#--- skill space designmatrix
TP <- length(theta.k)
w1 <- stats::dnorm(theta.k)
w1 <- w1 / sum(w1)
delta.designmatrix <- matrix( 1, nrow=TP, ncol=2 )
delta.designmatrix[,2] <- log(w1)

# fixed lambda parameters
Xlambda.fixed <- cbind(NXlam, 1 )
# initial Xlambda parameters
dif_sim <- 0*stats::rnorm(I, sd=.2)
Xlambda.init <- c( - stats::qnorm( colMeans(dat1) ), dif_sim, 1 )

# delta.fixed
delta.fixed <- cbind( 1, 1, 0 )
# regularization parameter
regular_lam <- .2
# weighting vector: regularize only DIF effects
regular_w <- c( rep(0,I), rep(1,I), 0 )

#--- estimation model with scad penalty
mod1 <- CDM::slca( dat[,-1], group=dat$group, Xdes=Xdes,
            delta.designmatrix=delta.designmatrix, regular_type="scad",
            Xlambda.init=Xlambda.init, delta.fixed=delta.fixed, Xlambda.fixed=Xlambda.fixed,
            regular_lam=regular_lam, regular_w=regular_w )
# compare true and estimated DIF effects
cbind( "true"=dif, "estimated"=round(coef(mod1)[seq(I+1,2*I)],2) )
summary(mod1)

## End(Not run)

[Package CDM version 8.2-6 Index]