adproclus {adproclus} | R Documentation |
Additive profile clustering
Description
Perform additive profile clustering (ADPROCLUS) on object-by-variable data. Creates a model that assigns the objects to overlapping clusters which are characterized in terms of the variables by the so-called profiles.
Usage
adproclus(
data,
nclusters,
start_allocation = NULL,
nrandomstart = 3,
nsemirandomstart = 3,
algorithm = "ALS1",
save_all_starts = FALSE,
seed = NULL
)
Arguments
data |
Object-by-variable data matrix of class |
nclusters |
Number of clusters to be used. Must be a positive integer. |
start_allocation |
Optional matrix of binary values as starting
allocation for first run. Default is |
nrandomstart |
Number of random starts (see |
nsemirandomstart |
Number of semi-random starts
(see |
algorithm |
Character string " |
save_all_starts |
Logical. If |
seed |
Integer. Seed for the random number generator. Default: NULL, meaning no reproducibility. |
Details
In this function, Mirkin's (1987, 1990) Additive Profile Clustering
(ADPROCLUS) method is used to obtain an unrestricted overlapping clustering
model of the object by variable data provided by data
.
The ADPROCLUS model approximates an I \times J
object by
variable data matrix X
by an I \times J
model matrix
M
that can be decomposed into an I \times K
binary
cluster membership matrix A
and a K \times J
real-valued cluster profile matrix P
, with K
indicating the number of overlapping clusters.
In particular, the aim of an ADPROCLUS analysis is therefore,
given a number of clusters K
, to estimate a
model matrix M = AP
which reconstructs the data matrix
X
as close as possible in a least squares sense
(i.e. sum of squared residuals). For a detailed illustration of the
ADPROCLUS model and associated loss function, see Wilderjans et al. (2011).
The alternating least squares algorithms ("ALS1
" and "ALS2
")
that can be used for minimization of the loss function were proposed by
Depril et al. (2008). In "ALS2
", starting from an initial random or
rational estimate of A
(see get_random
and
get_semirandom
), A
and P
are alternately re-estimated conditionally upon each other until convergence.
The "ALS1
" algorithm differs from the previous one in that each
row in A
is updated independently and that the
conditionally optimal P
is recalculated after each row
update, instead of the end of the matrix. For a discussion and comparison of
the different algorithms, see Depril et al., 2008.
Warning: Computation time increases exponentially with increasing
number of clusters, K
. We recommend to determine the computation time
of a single start for each specific dataset and K
before increasing the
number of starts.
Value
adproclus()
returns a list with the following
components, which describe the best model (from the multiple starts):
model
matrix. The obtained overlapping clustering model M of the same size as
data
.A
matrix. The membership matrix A of the clustering model. Clusters are sorted by size.
P
matrix. The profile matrix P of the clustering model.
sse
numeric. The residual sum of squares of the clustering model, which is minimized by the ALS algorithm.
totvar
numeric. The total sum of squares of
data
.explvar
numeric. The proportion of variance in
data
that is accounted for by the clustering model.iterations
numeric. The number of iterations of the algorithm.
timer
numeric. The amount of time (in seconds) the complete algorithm ran for.
timer_one_run
numeric. The amount of time (in seconds) the relevant single start ran for.
initial_start
list. Containing the initial membership matrix, as well as the type of start that was used to obtain the clustering solution. (as returned by
get_random
orget_semirandom
)runs
list. Each element represents one model obtained from one of the multiple starts. Each element contains all of the above information for the respective start.
parameters
list. Contains the parameters used for the model.
References
Wilderjans, T. F., Ceulemans, E., Van Mechelen, I., & Depril, D. (2011S). ADPROCLUS: a graphical user interface for fitting additive profile clustering models to object by variable data matrices. Behavior Research Methods, 43(1), 56-65.
Depril, D., Van Mechelen, I., & Mirkin, B. (2008). Algorithms for additive clustering of rectangular data tables. Computational Statistics and Data Analysis, 52, 4923-4938.
Mirkin, B. G. (1987). The method of principal clusters. Automation and Remote Control, 10:131-143.
Mirkin, B. G. (1990). A sequential fitting procedure for linear data analysis models. Journal of Classification, 7(2):167-195.
See Also
adproclus_low_dim
for low dimensional ADPROCLUS
get_random
for generating random starts
get_semirandom
for generating semi-random starts
get_rational
for generating rational starts
Examples
# Loading a test dataset into the global environment
x <- stackloss
# Quick clustering with K = 2 clusters
clust <- adproclus(data = x, nclusters = 2)
# Clustering with K = 3 clusters,
# using the ALS2 algorithm,
# with 2 random and 2 semi-random starts
clust <- adproclus(x, 3,
nrandomstart = 2, nsemirandomstart = 2, algorithm = "ALS2"
)
# Saving the results of all starts
clust <- adproclus(x, 3,
nrandomstart = 2, nsemirandomstart = 2, save_all_starts = TRUE
)
# Clustering using a user-defined rational start profile matrix
# (here the first 4 rows of the data)
start <- get_rational(x, x[1:4, ])$A
clust <- adproclus(x, 4, start_allocation = start)