R: Least-squares Bilinear Clustering of Three-way Data

lsbclust {lsbclust}

R Documentation

Least-squares Bilinear Clustering of Three-way Data

Description

This function clusters along one way of a three-way array (as specified by margin) while decomposing along the other two dimensions. Four types of clusterings are allowed based on the respective two-way slices of the array: on the overall means, row margins, column margins and the interactions between rows and columns. Which clusterings can be fit is determined by the vector delta, with four binary elements. All orthogonal models are fitted. The nonorthogonal case delta = (1, 1, 0, 0) returns an error. See the reference for further details.

Usage

lsbclust(data, margin = 3L, delta = c(1L, 1L, 1L, 1L), nclust,
  ndim = 2L, fixed = c("none", "rows", "columns"), nstart = 20L,
  starts = NULL, nstart.kmeans = 500L, alpha = 0.5,
  parallel = FALSE, maxit = 100L, verbose = 1, method = "diag",
  type = NULL, sep.nclust = TRUE, ...)

Arguments

`data`	A three-way array representing the data.
`margin`	An integer giving the single subscript of `data` over which the clustering will be applied.
`delta`	A four-element binary vector (logical or numeric) indicating which sum-to-zero constraints must be enforced.
`nclust`	A vector of length four giving the number of clusters for the overall mean, the row margins, the column margins and the interactions (in that order) respectively. Alternatively, a vector of length one, in which case all components will have the same number of clusters.
`ndim`	The required rank for the approximation of the interactions (a scalar).
`fixed`	One of `"none"`, `"rows"` or `"columns"` indicating whether to fix neither sets of coordinates, or whether to fix the row or column coordinates across clusters respectively. If a vector is supplied, only the first element will be used (passed to `int.lsbclust`).
`nstart`	The number of random starts to use for the interaction clustering.
`starts`	A list containing starting configurations for the cluster membership vector. If not supplied, random initializations will be generated (passed to `int.lsbclust`).
`nstart.kmeans`	The number of random starts to use in `kmeans`.
`alpha`	Numeric value in [0, 1] which determines how the singular values are distributed between rows and columns (passed to `int.lsbclust`).
`parallel`	Logical indicating whether to parallel over different starts or not (passed to `int.lsbclust`).
`maxit`	The maximum number of iterations allowed in the interaction clustering.
`verbose`	Integer controlling the amount of information printed: 0 = no information, 1 = Information on random starts and progress, and 2 = information is printed after each iteration for the interaction clustering.
`method`	The method for calculating cluster agreement across random starts, passed on to `cl_agreement` (passed to `int.lsbclust`).
`type`	One of `"rows"`, `"columns"` or `"overall"` (or a unique abbreviation of one of these) indicating whether clustering should be done on row margins, column margins or the overall means of the two-way slices respectively. If more than one opion are supplied, the algorithm is run for all (unique) options supplied (passed to `orc.lsbclust`). This is an optional argument.
`sep.nclust`	Logical indicating how nclust should be used across different `type`'s. If `sep.nclust` is `TRUE`, `nclust` is recycled so that each `type` can have a different number of clusters. If `sep.nclust` is `FALSE`, the same vector `nclust` is used for all `type`'s.
`...`	Additional arguments passed to `kmeans`.

Value

Returns an object of S3 class lsbclust which has slots:

`overall`	Object of class `ovl.kmeans` for the overall means clustering
`rows`	Object of class `row.kmeans` for the row means clustering
`columns`	Object of class `col.kmeans` for the column means clustering
`interactions`	Object of class `int.lsbclust` for the interaction clustering
`call`	The function call used to create the object
`delta`	The value of `delta` in the fit
`df`	Breakdown of the degrees-of-freedom across the different subproblems
`loss`	Breakdown of the loss across subproblems
`time`	Time taken in seconds to calculate the solution
`cluster`	Matrix of cluster membership per observation for all cluster types

References