MCCCA {mccca} | R Documentation |
apply MCCCA for dataset.
Description
Applies MCCCA to mcccadata.list
.
Usage
MCCCA(
mccca.data,
K.vec = K.vec,
known.vec = NULL,
knowncluster.list = NULL,
nstart = 3,
maxit = 50,
p = 2,
tol = 1e-08,
verbose = TRUE,
remove.miss = TRUE,
kmeans.initial = TRUE
)
Arguments
mccca.data |
A list created in |
K.vec |
An integer vector of length C (the number of classes). Each element corresponds to the number of clusters in each class specified for estimation. |
known.vec |
A vector of length C giving logical values indicating whether a cluster allocation in each class is known or not. The default is all |
knowncluster.list |
A vector of length C giving logical values indicating whether a cluster allocation in each class is known or not. The default is all |
nstart |
An integer indicating the number of random initial values. |
maxit |
An integer indicating the maximum number of iterations. |
p |
An integer indicating the dimension of quantification.The default is 2. |
tol |
A numeric value indicating the absolute convergence tolerance. |
verbose |
A logical value indicating. If |
remove.miss |
A logical value indicating whether categories nobody choose are removed nor not. The default is |
kmeans.initial |
A logical value indicating whether the 1st initial value for indicator matrix is generated by kmeans or not. The default is |
Details
Bg
,Gg
and Qg
are scaled B
,G
and Q
respectively, such that the average squared deviation from the origin of the row and column points is the same (See section 2.3 in the paper).
If you want to specify the cluster allocation for some or all classes, prepare the following two.
-knowncluster.list
: A list of C vectors. The length of each vector in the list should be the same as the number of rows in each matrix in the data.list
(ex. length(knowncluster.list[[c]])=nrow(data.list[[c]])
, (c=1,..,C)).
For example, suppose that data.list
is a list of 4 matrices (meaning C=4),
and the cluster assignment is known only for the second class,
and the assignments in other classes are estimated. In this case,
the second vector of knowncluster.list
should be specified as the vector of cluster indexes
to which the observations in each row of data.list[[2]]
belong, with length nrow(data.list[[2]])
,
and the other vectors (1, 3, and 4) in the list can be specified as NA
. For each vector in the knowncluster.list
,
the specified cluster index should start from 1, and there should not be any skipping numbers.
-known.vec
: A vector of logical values of length C. For example,
if C=4 and you want to know the cluster assignment of only the second class, it should be known.vec=c(FALSE,TRUE,FALSE,FALSE)
.
Value
Returns a list with the following elements.
G |
A (Kxp) quantification matrix for all clusters (K= |
Gg |
Scaled |
B |
A (Qxp) quantification matrix for all categories (Q= |
Bg |
Scaled |
Q |
A (Nxp) quantification matrix for all observations. |
Qg |
Scaled |
clses.list |
A list of C vectors, giving the estimated cluster index for each observation in each class. |
clses.vec |
A vector of length N, where each element represents the cluster index to which the observations in the rows of |
optval |
A numeric value giving the optimized value of the objective function that is the smallest among all initial values. |
optval.vec |
A numeric vector of length |
stepconv |
An integer giving the number of iterations until convergence at the initial value where the objective function was the smallest. |
stepconv.vec |
An integer vector of length |
catename.vec |
A characteristic vector of length |
catename.vari.vec |
A characteristic vector of length |
cate.removed |
If there is a category that no one chooses and |
cluster.vec |
An integer vector of length K, where each index in the |
q.vec |
A vector of length J, same as the one given in |
K.vec |
A vector of length C, which is used as an input in this |
classlabel |
A characteristic vector of length C, same as the one given in |
References
Takagishi & Michel van de Velden (2022): Visualizing Class Specific Heterogeneous Tendencies in Categorical Data, Journal of Computational and Graphical Statistics, DOI: 10.1080/10618600.2022.2035737
See Also
Examples
#setting
N <- 100 ; J <- 5 ; Ktrue <- 2 ; q.vec <- rep(5,J) ; noise.prop <- 0.2
extcate.vec=c(2,3)#the number of categories for each external variable
#generate categorical variable data
catedata.list <- generate.onedata(N=N,J=J,Ktrue=Ktrue,q.vec=q.vec,noise.prop = noise.prop)
data.cate=catedata.list$data.mat
clstr0.vec=catedata.list$clstr0.vec
#generate external variable data
data.ext=generate.ext(N,extcate.vec=extcate.vec)
#create mccca.list to be applied to MCCCA function
mccca.data=create.MCCCAdata(data.cate,ext.mat=data.ext,clstr0.vec =clstr0.vec)
#specify the number of cluster for each of C classes
C=length(mccca.data$data.list)
K.vec=rep(2,C)
#apply MCCCA
mccca.res=MCCCA(mccca.data,K.vec=K.vec)
#plot MCCCA result
plot(mccca.res)
#if you want to specify cluster allocation in the 2nd class:
knowncluster.list=rep(list(NA),C)
#specify cluster index for the 2nd class
N2=nrow(mccca.data$data.list[[2]])
knowncluster.list[[2]]=rep(c(1,2),times=c(2,N2-2))
known.vec=c(FALSE,TRUE,FALSE,FALSE,FALSE,FALSE)
mccca.res=MCCCA(mccca.data,K.vec=K.vec,known.vec=known.vec,knowncluster.list = knowncluster.list)