adproclus {adproclus}R Documentation

Additive profile clustering


Perform additive profile clustering (ADPROCLUS) on object-by-variable data. Creates a model that assigns the objects to overlapping clusters which are characterized in terms of the variables by the so-called profiles.


  start_allocation = NULL,
  nrandomstart = 3,
  nsemirandomstart = 3,
  algorithm = "ALS1",
  save_all_starts = FALSE,
  seed = NULL



Object-by-variable data matrix of class matrix or data.frame.


Number of clusters to be used. Must be a positive integer.


Optional matrix of binary values as starting allocation for first run. Default is NULL.


Number of random starts (see get_random). Can be zero. Increase for better results, though longer computation time. Some research finds 500 starts to be a useful reference.


Number of semi-random starts (see get_semirandom)). Can be zero. Increase for better results, though longer computation time. Some research finds 500 starts to be a useful reference.


Character string "ALS1" (default) or "ALS2", denoting the type of alternating least squares algorithm. Can be abbreviated with "1" or "2".


Logical. If TRUE, the results of all algorithm starts are returned. By default, only the best solution is retained.


Integer. Seed for the random number generator. Default: NULL, meaning no reproducibility.


In this function, Mirkin's (1987, 1990) Additive Profile Clustering (ADPROCLUS) method is used to obtain an unrestricted overlapping clustering model of the object by variable data provided by data.

The ADPROCLUS model approximates an I×JI \times J object by variable data matrix XX by an I×JI \times J model matrix MM that can be decomposed into an I×KI \times K binary cluster membership matrix AA and a K×JK \times J real-valued cluster profile matrix PP, with KK indicating the number of overlapping clusters. In particular, the aim of an ADPROCLUS analysis is therefore, given a number of clusters KK, to estimate a model matrix M=APM = AP which reconstructs the data matrix XX as close as possible in a least squares sense (i.e. sum of squared residuals). For a detailed illustration of the ADPROCLUS model and associated loss function, see Wilderjans et al. (2011).

The alternating least squares algorithms ("ALS1" and "ALS2") that can be used for minimization of the loss function were proposed by Depril et al. (2008). In "ALS2", starting from an initial random or rational estimate of AA (see get_random and get_semirandom), AA and PP are alternately re-estimated conditionally upon each other until convergence. The "ALS1" algorithm differs from the previous one in that each row in AA is updated independently and that the conditionally optimal PP is recalculated after each row update, instead of the end of the matrix. For a discussion and comparison of the different algorithms, see Depril et al., 2008.

Warning: Computation time increases exponentially with increasing number of clusters, KK. We recommend to determine the computation time of a single start for each specific dataset and KK before increasing the number of starts.


adproclus() returns a list with the following components, which describe the best model (from the multiple starts):


matrix. The obtained overlapping clustering model M of the same size as data.


matrix. The membership matrix A of the clustering model. Clusters are sorted by size.


matrix. The profile matrix P of the clustering model.


numeric. The residual sum of squares of the clustering model, which is minimized by the ALS algorithm.


numeric. The total sum of squares of data.


numeric. The proportion of variance in data that is accounted for by the clustering model.


numeric. The number of iterations of the algorithm.


numeric. The amount of time (in seconds) the complete algorithm ran for.


numeric. The amount of time (in seconds) the relevant single start ran for.


list. Containing the initial membership matrix, as well as the type of start that was used to obtain the clustering solution. (as returned by get_random or get_semirandom)


list. Each element represents one model obtained from one of the multiple starts. Each element contains all of the above information for the respective start.


list. Contains the parameters used for the model.


Wilderjans, T. F., Ceulemans, E., Van Mechelen, I., & Depril, D. (2011S). ADPROCLUS: a graphical user interface for fitting additive profile clustering models to object by variable data matrices. Behavior Research Methods, 43(1), 56-65.

Depril, D., Van Mechelen, I., & Mirkin, B. (2008). Algorithms for additive clustering of rectangular data tables. Computational Statistics and Data Analysis, 52, 4923-4938.

Mirkin, B. G. (1987). The method of principal clusters. Automation and Remote Control, 10:131-143.

Mirkin, B. G. (1990). A sequential fitting procedure for linear data analysis models. Journal of Classification, 7(2):167-195.

See Also


for low dimensional ADPROCLUS


for generating random starts


for generating semi-random starts


for generating rational starts


# Loading a test dataset into the global environment
x <- stackloss

# Quick clustering with K = 2 clusters
clust <- adproclus(data = x, nclusters = 2)

# Clustering with K = 3 clusters,
# using the ALS2 algorithm,
# with 2 random and 2 semi-random starts
clust <- adproclus(x, 3,
  nrandomstart = 2, nsemirandomstart = 2, algorithm = "ALS2"

# Saving the results of all starts
clust <- adproclus(x, 3,
  nrandomstart = 2, nsemirandomstart = 2, save_all_starts = TRUE

# Clustering using a user-defined rational start profile matrix
# (here the first 4 rows of the data)
start <- get_rational(x, x[1:4, ])$A
clust <- adproclus(x, 4, start_allocation = start)

[Package adproclus version 1.0.2 Index]