R: Clustering on the torus by conformal prediction

clus.torus {ClusTorus}

R Documentation

Clustering on the torus by conformal prediction

Description

clus.torus returns clustering results of data on the torus based on inductive conformal prediction set

Usage

clus.torus(
  data,
  split.id = NULL,
  model = c("kmeans", "mixture"),
  mixturefitmethod = c("axis-aligned", "circular", "general"),
  kmeansfitmethod = c("general", "homogeneous-circular", "heterogeneous-circular",
    "ellipsoids"),
  J = NULL,
  level = NULL,
  option = NULL,
  verbose = TRUE,
  ...
)

## S3 method for class 'clus.torus'
plot(
  x,
  panel = 1,
  assignment = "outlier",
  data = NULL,
  ellipse = TRUE,
  type = NULL,
  overlay = FALSE,
  out = FALSE,
  ...
)

Arguments

`data`	n x d matrix of toroidal data on `[0, 2\pi)^d` or `[-\pi, \pi)^d`. Default is `NULL`.
`split.id`	a n-dimensional vector consisting of values 1 (estimation) and 2(evaluation)
`model`	A string. One of "mixture" and "kmeans" which determines the model or estimation methods. If "mixture", the model is based on the von Mises mixture, fitted with an EM algorithm. It supports the von Mises mixture and its variants based conformity scores. If "kmeans", the model is also based on the von Mises mixture, but the parameter estimation is implemented with the elliptical k-means algorithm. It supports the log-max-mixture based conformity score only. If the dimension of data space is greater than 2, only "kmeans" is supported. Default is `model = "kmeans"`.
`mixturefitmethod`	A string. One of "circular", "axis-aligned", and "general" which determines the constraint of the EM fitting. Default is "axis-aligned". This argument only works for `model = "mixture"`.
`kmeansfitmethod`	A string. One of "general", ellipsoids", "heterogeneous-circular" or "homogeneous-circular". If "general", the elliptical k-means algorithm with no constraint is used. If "ellipsoids", only the one iteration of the algorithm is used. If"heterogeneous-circular", the same as above, but with the constraint that ellipsoids must be spheres. If "homogeneous-circular", the same as above but the radii of the spheres are identical. Default is "general". This argument only works for `model = "kmeans"`.
`J`	the number of components for mixture model fitting. If `J` is a vector, then `hyperparam.torus` is used to choose optimal `J`. If `J == NULL`, then `J = 4:30` is used.
`level`	a scalar in `[0,1]`. The level of the conformal prediction set used for clustering. If `level == NULL`, then `hyperparam.alpha` is used to choose optimal `level`
`option`	A string. One of "elbow", "risk", "AIC", or "BIC", which determines the criterion for the model selection. "risk" is based on the negative log-likelihood, "AIC" for the Akaike Information Criterion, and "BIC" for the Bayesian Information Criterion. "elbow" is based on minimizing the criterion used in Jung et. al.(2021). This argument is only used if `J` is a vector or `NULL`.
`verbose`	boolean index, which indicates whether display additional details as to what the algorithm is doing or how many loops are done. Default is `TRUE`.
`...`	Further arguments that will be passed to `icp.torus` and `hyperparam.torus`
`x`	`clus.torus` object
`panel`	One of 1 or 2 which determines the type of plot. If `panel = 1`, `x$cluster.obj` will be plotted, if `panel = 2`, `x$icp.torus` will be plotted. If `panel = 3`, `x$hyperparam.select` will be plotted. Default is `panel = 1`.
`assignment`	A string. One of "outlier", "log.density", "posterior", "mahalanobis". Default is "outlier".
`ellipse`	A boolean index which determines whether plotting ellipse-intersections. Default is `TRUE`. Only available for `panel = 2`.
`type`	A string. One of "mix", "max" or "e". This argument is only available if `icp.torus` object is fitted with `model = "mixture"`. Default is `NULL`. If `type != NULL`, argument `ellipse` automatically becomes `FALSE`. If "mix", it plots based on von Mises mixture. If "max", it plots based on von Mises max-mixture. If "e", it plots based on ellipse-approximation.
`overlay`	A boolean index which determines whether plotting ellipse-intersections on clustering plots. Default is `FALSE`. Only available for `panel = 1`.
`out`	An option for returning the ggplot object. Default is `FALSE`.

Details

clus.torus is a user-friendly all-in-one function which implements following procedures automatically: 1. compute conformity scores for given model and fitting method, 2. choose optimal model and level based on prespecified criterion, and 3. make clusters based on the chosen model and level. Procedure 1-3 can be independently done with icp.torus, hyperparam.torus, hyperparam.J, hyperparam.alpha and cluster.assign.torus. If you want to see more detail for each procedure, please see icp.torus, hyperparam.J, hyperparam.alpha hyperparam.torus, cluster.assign.torus.

Value

clus.torus returns a clus.torus object, which consists of following 3 different S3 objects;

cluster.obj: cluster.obj object; clustering assignment results for several methods. For detail, see cluster.assign.torus.
icp.torus: icp.torus object; containing model parameters and conformity scores. For detail, see icp.torus.
hyperparam.select: hyperparam.torus object (if J = NULL or a sequence of numbers, and level = NULL or a sequence of numbers), hyperparam.J object (if level is a scalar), or hyperparam.alpha object (if J is a scalar); contains information for the optimally chosen model (number of components J) and level (alpha) based on prespecified criterion. For detail, see hyperparam.torus, hyperparam.J, and hyperparam.alpha.

References

Jung, S., Park, K., & Kim, B. (2021). Clustering on the torus by conformal prediction. The Annals of Applied Statistics, 15(4), 1583-1603.

Mardia, K. V., Kent, J. T., Zhang, Z., Taylor, C. C., & Hamelryck, T. (2012). Mixtures of concentrated multivariate sine distributions with applications to bioinformatics. Journal of Applied Statistics, 39(11), 2475-2492.

Shin, J., Rinaldo, A., & Wasserman, L. (2019). Predictive clustering. arXiv preprint arXiv:1903.08125.

Examples


data <- toydata2[, 1:2]
n <- nrow(data)
clus.torus(data = data, model = "kmeans", kmeansfitmethod = "general", J = 5:30, option = "risk")

[Package ClusTorus version 0.2.2 Index]