R: Cluster analysis for cognitive diagnosis based on the...

cd.cluster {ACTCD}

R Documentation

Cluster analysis for cognitive diagnosis based on the Asymptotic Classification Theory

Description

cd.cluster is used to classify examinees into unlabeled clusters based on cluster analysis. Available options include K-means and Hierarchical Agglomerative Cluster Analysis (HACA) with various links.

Usage

cd.cluster (Y, Q, method = c("HACA", "Kmeans"), Kmeans.centers = NULL,
Kmeans.itermax = 10, Kmeans.nstart = 1, HACA.link = c("complete", "ward", "single",
 "average", "mcquitty", "median", "centroid"), HACA.cut = NULL)

Arguments

`Y`	A required `N \times J` response matrix with binary elements (1=correct, 0=incorrect), where `N` is the number of examinees and `J` is the number of items.
`Q`	A required `J \times K` binary item-by-attribute association matrix (Q-matrix), where `K` is the number of attributes. The `j^{th}` row of the matrix is an indicator vector, 1 indicating attributes are required and 0 indicating attributes are not required to master item `j`.
`method`	The clustering algorithm used to classify data. Two options are available, including `"Kmeans"` and `"HACA"`, where `"HACA"` is the default method.
`Kmeans.centers`	The number of clusters when `"Kmeans"` argument is selected. It must be not less than 2 and not greater than `2^K` where `K` is the number of attributes. The default is `2^K`.
`Kmeans.itermax`	The maximum number of iterations allowed when `"Kmeans"` argument is selected.
`Kmeans.nstart`	The number of random sets to be chosen when `"Kmeans"` argument is selected.
`HACA.link`	The link to be used with HACA. It must be one of `"ward"`, `"single"`, `"complete"`, `"average"`, `"mcquitty"`, `"median"` or `"centroid"`. The default `"HACA.link"` is `"complete"`.
`HACA.cut`	The number of clusters when `"HACA"` argument is specified. It must be not less than 2 and not greater than `2^K`, where `K` is the number of attributes. The default is `2^K`.

Details

Based on the Asymptotic Classification Theory (Chiu, Douglas & Li, 2009), A sample statistic \bm{W} (See ACTCD) is calculated using the response matrix and Q-matrix provided by the users and then taken as the input for cluster analysis (i.e. K-means and HACA).

The number of latent clusters can be specified by the users in Kmeans.centers or HACA.cut. It must be not less than 2 and not greater than 2^K, where K is the number of attributes. Note that if the number of latent clusters is less than the default value (2^K), the clusters cannot be labeled in labeling using method="1" and method="3" algorithms. See labeling for more information.

Value

`W`	The `N \times K` sample statistic `\bm{W}` for the clustering algorithm. See details for more information.
`size`	A set of integers, indicating the sizes of latent clusters.
`mean.w`	A matrix of cluster centers, representing the average `\bm{W}` of the latent clusters.
`wss.w`	The vector of within-cluster sum of squares of `\bm{W}`.
`sqmwss.w`	The vector of square root of mean of within-cluster sum of squares of `\bm{W}`.
`mean.y`	The vector of the mean of sum scores of the clusters.
`class`	The vector of estimated memberships for examinees.

References

Chiu, C. Y., Douglas, J. A., & Li, X. (2009). Cluster analysis for cognitive diagnosis: theory and applications. Psychometrika, 74(4), 633-665.

Examples

# Classification based on the simulated data and Q matrix
data(sim.dat)
data(sim.Q)
# Information about the dataset
N <- nrow(sim.dat) #number of examinees
J <- nrow(sim.Q) #number of items
K <- ncol(sim.Q) #number of attributes

#the default number of latent clusters is 2^K
cluster.obj <- cd.cluster(sim.dat, sim.Q)
#cluster size
sizeofc <- cluster.obj$size
#W statistics
W <- cluster.obj$W

#User-specified number of latent clusters
M <- 5  # the number of clusters is fixed to 5
cluster.obj <- cd.cluster(sim.dat, sim.Q, method="HACA", HACA.cut=M) 
#cluster size
sizeofc <- cluster.obj$size
#W statistics
W <- cluster.obj$W

M <- 5 # the number of clusters is fixed to 5
cluster.obj <- cd.cluster(sim.dat, sim.Q, method="Kmeans", Kmeans.centers =M)  
#cluster size
sizeofc <- cluster.obj$size
#W statistics
W <- cluster.obj$W

[Package ACTCD version 1.3-0 Index]