plot.spwkm {vimpclust}R Documentation

Plots from a "spwkm" object

Description

Produces several graphics to help interpreting a spwkm object.

Usage

## S3 method for class 'spwkm'
plot(
  x,
  what = "weights.features",
  Which = NULL,
  xtitle = NULL,
  ytitle = NULL,
  title = NULL,
  showlegend = NULL,
  legendtitle = NULL,
  ...
)

Arguments

x

An object of class spwkm.

what

A character string indicating which element of x to be plotted. See section "Details" below for further information.

Which

A numerical vector indexing the groups or the variables to be displayed. See section "Details" below for further information.

xtitle

The title of the x-axis.

ytitle

The title of the y-axis.

title

The title of the graphic.

showlegend

A boolean. If showlegend=NULL (default value), the legend is displayed.

legendtitle

The title of the legend.

...

Further arguments to the plot function.

Details

The plot function allows to represent the regularization paths for a grid of values of lambda, as well as several quality criteria associated to the clustering.

For both groupsparsewkm and sparsewkm functions, the following options are available:

If what=weights.features, the regularization paths for the weights associated to the variables are displayed.

If what=sel.features, the graph represents the number of selected variables for each value of the regularization parameter lambda. In the case of sparse weighted k-means for mixed data, categorical variables are represented with dotted lines so that one easily identifies them.

If what=expl.var, the explained variance (computed as the contribution of the between-class variance to the global variance) is displayed. This criterion is computed for all variables in the data set, without taking into account the weights of the group or of the variables.

If what=w.expl.var, the explained weighted variance is computed. The difference with the criterion above is that the weights of the variables are taken into account in the computation. This leads to a criterion which, for large regularization parameters lambda, may be computed on one variable only, if its weight becomes equal to 1 and all the others are discarded.

If what=pen.crit, the graph displays the evolution of the penalized criterion, maximized by the algorithm. This criterion writes as the between-class weighted sum-of-squares, penalized by a group L1-norm. For more details on the mathematical expressions, one may refer to Chavel et al. (2020).

For the outcome of the groupsparsewkm function trained on numerical data only, two more options are available:

If what=weights.groups, the regularization paths for the weights associated to the groups of variables are displayed.

If what=sel.groups, the graph represents the number of selected groups for each value of the regularization parameter lambda.

For the outcome of the sparsewkm function trained on mixed data, two more options are also available:

If what=weights.levels, the regularization paths for the weights associated to the levels of the categorical variables are displayed.

If what=sel.levels, the graph represents the number of selected levels associated to the categorical variables plus the number of selected numerical variables, for each value of the regularization parameter lambda.

If the number of groups in groupsparsewkm or if the number of features in sparsewkm are too large to have easily interpretable graphics, one may select some groups or some variables using the argument Which. Note that when training sparsewkm on mixed data, the initial order of the variables is changed: after the processing step, numerical variables are displayed first, and categorical second. The indexing provided in Which should take this into account (see the Examples section).

Value

p

an object of class ggplot.

References

M., Chavent, J. Lacaille, A. Mourer, and M. Olteanu (2020). Sparse k-means for mixed data via group-sparse clustering. To appear in ESANN proceedings.

See Also

sparsewkm, groupsparsewkm

Examples

# sparse weighted k-means on mixed data

data(HDdata)
out <- sparsewkm(X = HDdata[,-14], centers = 2)
plot(out, what = "weights.features")
plot(out, what = "weights.levels")
plot(out, what = "sel.features")
plot(out, what = "sel.levels")
plot(out, what = "expl.var")
plot(out, what = "w.expl.var")
plot(out, what = "pen.crit")
# plot the regularization paths for first three variables only 
plot(out, what = "weights.features", Which=1:3)
 
# group sparse weighted k-means on numerical data
data(iris)
index <- c(1, 2, 1, 2)
out <- groupsparsewkm(X = iris[,-5], centers = 3, index = index)
plot(out, what = "weights.groups")
plot(out, what = "weights.features")
plot(out, what = "sel.groups")
plot(out, what = "sel.features")
plot(out, what = "expl.var")
plot(out, what = "w.expl.var")
plot(out, what = "pen.crit")
# plot the regularization paths for the variables in the first group only
plot(out, what = "weights.features", Which=1)


[Package vimpclust version 0.1.0 Index]