adproclus_low_dim {adproclus} | R Documentation |
Low dimensional ADPROCLUS
Description
Perform low dimensional additive profile clustering (ADPROCLUS) on object by variable data. Use case: data to cluster consists of a large set of variables, where it can be useful to interpret the cluster profiles in terms of a smaller set of components that represent the original variables well.
Usage
adproclus_low_dim(
data,
nclusters,
ncomponents,
start_allocation = NULL,
nrandomstart = 3,
nsemirandomstart = 3,
save_all_starts = FALSE,
seed = NULL
)
Arguments
data |
Object-by-variable data matrix of class |
nclusters |
Number of clusters to be used. Must be a positive integer. |
ncomponents |
Number of components (dimensions) to which the profiles should be restricted. Must be a positive integer. |
start_allocation |
Optional matrix of binary values as starting
allocation for first run. Default is |
nrandomstart |
Number of random starts (see |
nsemirandomstart |
Number of semi-random starts
(see |
save_all_starts |
logical. If |
seed |
Integer. Seed for the random number generator. Default: NULL, meaning no reproducibility |
Details
In this function, an extension by Depril et al. (2012) of
Mirkins (1987, 1990) additive profile clustering method is used to obtain a
low dimensional overlapping clustering model of the object by variable data
provided by data
.
More precisely, the low dimensional ADPROCLUS model approximates an
object by variable data matrix
by an
model matrix
. For
overlapping
clusters,
can be decomposed into an
binary cluster membership matrix
and a
real-valued cluster profile matrix
s.t.
With the simultaneous dimension reduction,
is restricted
to be of reduced rank
, such that it can be decomposed
into
with
a
matrix and
a
matrix. Now, a row in
represents the profile values associated with the
respective cluster in terms of the
components, while
the entries of
can be used to interpret the components
in terms of the complete set of variables. In particular, the aim of an
ADPROCLUS analysis is therefore, given a number of clusters
and a
number of dimensions
, to estimate a model matrix
that reconstructs data matrix
as close as possible in a least squares sense and
simultaneously reduce the dimensions of the data.
For a detailed illustration of the low dimensional ADPROCLUS model and
associated loss function, see Depril et al. (2012).
Warning: Computation time increases exponentially with increasing
number of clusters, . We recommend to determine the computation time
of a single start for each specific dataset and
before increasing the
number of starts.
Value
adproclus_low_dim()
returns a list with the following
components, which describe the best model (from the multiple starts):
model
matrix. The obtained overlapping clustering model
of the same size as
data
.model_lowdim
matrix. The obtained low dimensional clustering model
of size
A
matrix. The membership matrix
of the clustering model. Clusters are sorted by size.
P
matrix. The profile matrix
of the clustering model.
c
matrix. The profile values in terms of the low dimensional components.
B
Variables-by-components matrix. Base vectors connecting low dimensional components with original variables. matrix. Warning: for computing
use
.
sse
numeric. The residual sum of squares of the clustering model, which is minimized by the ALS algorithm.
totvar
numeric. The total sum of squares of
data
.explvar
numeric. The proportion of variance in
data
that is accounted for by the clustering model.iterations
numeric. The number of iterations of the algorithm.
timer
numeric. The amount of time (in seconds) the complete algorithm ran for.
timer_one_run
numeric. The amount of time (in seconds) the relevant single start ran for.
initial_start
list. A list containing the initial membership matrix, as well as the type of start that was used to obtain the clustering solution. (as returned by
get_random
orget_semirandom
)runs
list. Each element represents one model obtained from one of the multiple starts. Each element contains all of the above information.
parameters
list. Containing the parameters used for the model.
References
Depril, D., Van Mechelen, I., & Wilderjans, T. F. (2012). Lowdimensional additive overlapping clustering. Journal of classification, 29, 297-320.
See Also
adproclus
for full dimensional ADPROCLUS
get_random
for generating random starts
get_semirandom
for generating semi-random starts
get_rational
for generating rational starts
Examples
# Loading a test dataset into the global environment
x <- stackloss
# Low dimensional clustering with K = 3 clusters
# where the resulting profiles can be characterized in S = 1 dimensions
clust <- adproclus_low_dim(x, 3, ncomponents = 1)