adproclus_low_dim {adproclus} | R Documentation |
Low dimensional ADPROCLUS
Description
Perform low dimensional additive profile clustering (ADPROCLUS) on object by variable data. Use case: data to cluster consists of a large set of variables, where it can be useful to interpret the cluster profiles in terms of a smaller set of components that represent the original variables well.
Usage
adproclus_low_dim(
data,
nclusters,
ncomponents,
start_allocation = NULL,
nrandomstart = 3,
nsemirandomstart = 3,
save_all_starts = FALSE,
seed = NULL
)
Arguments
data |
Object-by-variable data matrix of class |
nclusters |
Number of clusters to be used. Must be a positive integer. |
ncomponents |
Number of components (dimensions) to which the profiles should be restricted. Must be a positive integer. |
start_allocation |
Optional matrix of binary values as starting
allocation for first run. Default is |
nrandomstart |
Number of random starts (see |
nsemirandomstart |
Number of semi-random starts
(see |
save_all_starts |
logical. If |
seed |
Integer. Seed for the random number generator. Default: NULL, meaning no reproducibility |
Details
In this function, an extension by Depril et al. (2012) of
Mirkins (1987, 1990) additive profile clustering method is used to obtain a
low dimensional overlapping clustering model of the object by variable data
provided by data
.
More precisely, the low dimensional ADPROCLUS model approximates an
I \times J
object by variable data matrix X
by an
I \times J
model matrix M
. For K
overlapping
clusters, M
can be decomposed into an I \times K
binary cluster membership matrix A
and a K \times J
real-valued cluster profile matrix P
s.t. M = AP.
With the simultaneous dimension reduction, P
is restricted
to be of reduced rank S < min(K,J)
, such that it can be decomposed
into P = CB',
with C
a K \times S
matrix and
B
a J \times S
matrix. Now, a row in
C
represents the profile values associated with the
respective cluster in terms of the S
components, while
the entries of B
can be used to interpret the components
in terms of the complete set of variables. In particular, the aim of an
ADPROCLUS analysis is therefore, given a number of clusters K
and a
number of dimensions S
, to estimate a model matrix M
that reconstructs data matrix
X
as close as possible in a least squares sense and
simultaneously reduce the dimensions of the data.
For a detailed illustration of the low dimensional ADPROCLUS model and
associated loss function, see Depril et al. (2012).
Warning: Computation time increases exponentially with increasing
number of clusters, K
. We recommend to determine the computation time
of a single start for each specific dataset and K
before increasing the
number of starts.
Value
adproclus_low_dim()
returns a list with the following
components, which describe the best model (from the multiple starts):
model
matrix. The obtained overlapping clustering model
M
of the same size asdata
.model_lowdim
matrix. The obtained low dimensional clustering model
AC
of sizeI \times S
A
matrix. The membership matrix
A
of the clustering model. Clusters are sorted by size.P
matrix. The profile matrix
P
of the clustering model.c
matrix. The profile values in terms of the low dimensional components.
B
Variables-by-components matrix. Base vectors connecting low dimensional components with original variables. matrix. Warning: for computing
P
useB'
.sse
numeric. The residual sum of squares of the clustering model, which is minimized by the ALS algorithm.
totvar
numeric. The total sum of squares of
data
.explvar
numeric. The proportion of variance in
data
that is accounted for by the clustering model.iterations
numeric. The number of iterations of the algorithm.
timer
numeric. The amount of time (in seconds) the complete algorithm ran for.
timer_one_run
numeric. The amount of time (in seconds) the relevant single start ran for.
initial_start
list. A list containing the initial membership matrix, as well as the type of start that was used to obtain the clustering solution. (as returned by
get_random
orget_semirandom
)runs
list. Each element represents one model obtained from one of the multiple starts. Each element contains all of the above information.
parameters
list. Containing the parameters used for the model.
References
Depril, D., Van Mechelen, I., & Wilderjans, T. F. (2012). Lowdimensional additive overlapping clustering. Journal of classification, 29, 297-320.
See Also
adproclus
for full dimensional ADPROCLUS
get_random
for generating random starts
get_semirandom
for generating semi-random starts
get_rational
for generating rational starts
Examples
# Loading a test dataset into the global environment
x <- stackloss
# Low dimensional clustering with K = 3 clusters
# where the resulting profiles can be characterized in S = 1 dimensions
clust <- adproclus_low_dim(x, 3, ncomponents = 1)