R: apply MCCCA for dataset.

MCCCA {mccca}

R Documentation

apply MCCCA for dataset.

Description

Applies MCCCA to mcccadata.list.

Usage

MCCCA(
  mccca.data,
  K.vec = K.vec,
  known.vec = NULL,
  knowncluster.list = NULL,
  nstart = 3,
  maxit = 50,
  p = 2,
  tol = 1e-08,
  verbose = TRUE,
  remove.miss = TRUE,
  kmeans.initial = TRUE
)

Arguments

`mccca.data`	A list created in `create.MCCCAdata`.
`K.vec`	An integer vector of length C (the number of classes). Each element corresponds to the number of clusters in each class specified for estimation.
`known.vec`	A vector of length C giving logical values indicating whether a cluster allocation in each class is known or not. The default is all `FALSE`.
`knowncluster.list`	A vector of length C giving logical values indicating whether a cluster allocation in each class is known or not. The default is all `FALSE`.
`nstart`	An integer indicating the number of random initial values.
`maxit`	An integer indicating the maximum number of iterations.
`p`	An integer indicating the dimension of quantification.The default is 2.
`tol`	A numeric value indicating the absolute convergence tolerance.
`verbose`	A logical value indicating. If `TRUE`, tracing information on the progress of the optimization is produced.
`remove.miss`	A logical value indicating whether categories nobody choose are removed nor not. The default is `TRUE`.
`kmeans.initial`	A logical value indicating whether the 1st initial value for indicator matrix is generated by kmeans or not. The default is `TRUE`.

Details

Bg,Gg and Qg are scaled B,G and Q respectively, such that the average squared deviation from the origin of the row and column points is the same (See section 2.3 in the paper).

If you want to specify the cluster allocation for some or all classes, prepare the following two.

-knowncluster.list: A list of C vectors. The length of each vector in the list should be the same as the number of rows in each matrix in the data.list (ex. length(knowncluster.list[[c]])=nrow(data.list[[c]]), (c=1,..,C)). For example, suppose that data.list is a list of 4 matrices (meaning C=4), and the cluster assignment is known only for the second class, and the assignments in other classes are estimated. In this case, the second vector of knowncluster.list should be specified as the vector of cluster indexes to which the observations in each row of data.list[[2]] belong, with length nrow(data.list[[2]]), and the other vectors (1, 3, and 4) in the list can be specified as NA. For each vector in the knowncluster.list, the specified cluster index should start from 1, and there should not be any skipping numbers.

-known.vec: A vector of logical values of length C. For example, if C=4 and you want to know the cluster assignment of only the second class, it should be known.vec=c(FALSE,TRUE,FALSE,FALSE).

Value

Returns a list with the following elements.

`G`	A (Kxp) quantification matrix for all clusters (K=`sum(K.vec)`).
`Gg`	Scaled `G`. See details.
`B`	A (Qxp) quantification matrix for all categories (Q=`sum(q.vec)`, and `q.vec` is given in `create.MCCCAdata`).
`Bg`	Scaled `B`.
`Q`	A (Nxp) quantification matrix for all observations.
`Qg`	Scaled `Q`.
`clses.list`	A list of C vectors, giving the estimated cluster index for each observation in each class.
`clses.vec`	A vector of length N, where each element represents the cluster index to which the observations in the rows of `data.mat` (given in `mccca.data`) belong.
`optval`	A numeric value giving the optimized value of the objective function that is the smallest among all initial values.
`optval.vec`	A numeric vector of length `nstart` giving the optimized values of the objective function for each initial value.
`stepconv`	An integer giving the number of iterations until convergence at the initial value where the objective function was the smallest.
`stepconv.vec`	An integer vector of length `nstart` giving the number of iterations until convergence for each initial value.
`catename.vec`	A characteristic vector of length `Q` that combines the category names of each categorical variable into a single vector.
`catename.vari.vec`	A characteristic vector of length `Q` with `catename.vec` plus the name of categorical variable (by default, this is used as the column name of `B` and `Bg`).
`cate.removed`	If there is a category that no one chooses and `remove.miss`=TRUE, `cate.removed` gives which category was removed (given by the index of column in dummy matrix). Otherwise, return `NULL`.
`cluster.vec`	An integer vector of length K, where each index in the `clses.list` and `clses.vec` indicates which class it corresponds to.
`q.vec`	A vector of length J, same as the one given in `mccca.data`.
`K.vec`	A vector of length C, which is used as an input in this `MCCCA` function.
`classlabel`	A characteristic vector of length C, same as the one given in `mccca.data`.

References

Takagishi & Michel van de Velden (2022): Visualizing Class Specific Heterogeneous Tendencies in Categorical Data, Journal of Computational and Graphical Statistics, DOI: 10.1080/10618600.2022.2035737

Examples

#setting
N <- 100 ; J <- 5 ; Ktrue <- 2 ; q.vec <- rep(5,J) ; noise.prop <- 0.2
extcate.vec=c(2,3)#the number of categories for each external variable

#generate categorical variable data
catedata.list <- generate.onedata(N=N,J=J,Ktrue=Ktrue,q.vec=q.vec,noise.prop = noise.prop)
data.cate=catedata.list$data.mat
clstr0.vec=catedata.list$clstr0.vec

#generate external variable data
data.ext=generate.ext(N,extcate.vec=extcate.vec)

#create mccca.list to be applied to MCCCA function
mccca.data=create.MCCCAdata(data.cate,ext.mat=data.ext,clstr0.vec =clstr0.vec)

#specify the number of cluster for each of C classes
C=length(mccca.data$data.list)
K.vec=rep(2,C)

#apply MCCCA
mccca.res=MCCCA(mccca.data,K.vec=K.vec)

#plot MCCCA result
plot(mccca.res)

#if you want to specify cluster allocation in the 2nd class:
knowncluster.list=rep(list(NA),C)
#specify cluster index for the 2nd class
N2=nrow(mccca.data$data.list[[2]])
knowncluster.list[[2]]=rep(c(1,2),times=c(2,N2-2))
known.vec=c(FALSE,TRUE,FALSE,FALSE,FALSE,FALSE)
mccca.res=MCCCA(mccca.data,K.vec=K.vec,known.vec=known.vec,knowncluster.list = knowncluster.list)

[Package mccca version 1.1.0.1 Index]