hyperparam.torus {ClusTorus}R Documentation

Selecting optimal hyperparameters for the conformal prediction set

Description

hyperparam.torus selects optimal hyperparamters for constructing the conformal prediction set, based on the type of postulated model and the criterion.

Usage

hyperparam.torus(
  data,
  icp.torus.objects = NULL,
  option = c("elbow", "risk", "AIC", "BIC"),
  split.id = NULL,
  Jvec = 3:35,
  kvec = 20:100,
  alphavec = NULL,
  alpha.lim = 0.15,
  method = c("kde", "mixture", "kmeans"),
  mixturefitmethod = c("circular", "axis-aligned", "general", "Bayesian"),
  kmeansfitmethod = c("homogeneous-circular", "heterogeneous-circular", "ellipsoids",
    "general"),
  init = c("kmeans", "hierarchical"),
  eval.point = NULL,
  additional.condition = TRUE,
  kmax = 500,
  THRESHOLD = 1e-10,
  maxiter = 200,
  verbose = FALSE
)

Arguments

data

n x d matrix of toroidal data on [0, 2π)^d or [-π, π)^d

icp.torus.objects

list whose elements are icp.torus objects, generated by icp.torus.score

option

A string. One of "elbow", "risk", "AIC", or "BIC", which determines the criterion for the model selection. "risk" is based on the negative log-likelihood, "AIC" for the Akaike Information Criterion, and "BIC" for the Bayesian Infor- mation Criterion. "elbow" is based on minimizing the criterion used in Jung, et. al.(2021).

split.id

a n-dimensinal vector consisting of values 1 (estimation) and 2(evaluation)

Jvec

either a scalar or a vector for the number of mixture components. Default value is 3:35.

kvec

either a scalar or a vector for the concentration parameter. Default value is 20:100.

alphavec

either a scalar or a vector, or even NULL for the levels. Default value is NULL. If NULL, then alphavec is automatically generated as a sequence from 0 to alpha.lim.

alpha.lim

a positive number lower than 1, which is the upper bound of Default is 0.15.

method

A string. One of "all", "kde", "mixture", and "kmeans" which determines the model or estimation methods. If "kde", the model is based on the kernel density estimates. It supports the kde-based conformity score only. If "mixutre", the model is based on the von Mises mixture, fitted with an EM algorithm. It supports the von Mises mixture and its variants based conformity scores. If "kmeans", the model is also based on the von Mises mixture, but the parameter estimation is implemented with the elliptical k-means algorithm illustrated in Appendix. It supports the log-max-mixture based conformity score only. Default is "all". If the dimension of data space is greater than 2, only "kmeans" is supported.

mixturefitmethod

A string. One of "circular", "axis-aligned", and "general" which determines the constraint of the EM fitting. Default is "axis-aligned". This argument only works for method = "mixture".

kmeansfitmethod

A string. One of "general", ellipsoids", "heterogeneous-circular" or "homogeneous-circular". If "general", the elliptical k-means algorithm with no constraint is used. If "ellipsoids", only the one iteration of the algorithm is used. If"heterogeneous-circular", the same as above, but with the constraint that ellipsoids must be spheres. If "homogeneous-circular", the same as above but the radii of the spheres are identical. This argument only works for method = "kmeans".

init

determine the initial parameter of "kmeans" method, for option "general". Must be "kmeans" or "hierarchical". If "kmeans", the initial parameters are obtained with extrinsic kmeans method. If "hierarchical", the initial parameters are obtained with hierarchical clustering method. Default is "kmeans".

eval.point

N x N numeric matrix on [0, 2π)^2. Default input is grid.torus.

additional.condition

boolean index. If TRUE, a singular matrix will be altered to the scaled identity.

kmax

the maximal number of kappa. If estimated kappa is larger than kmax, then put kappa as kmax.

THRESHOLD

number for difference between updating and updated parameters. Default is 1e-10.

maxiter

the maximal number of iteration. Default is 200.

verbose

boolean index, which indicates whether display additional details as to what the algorithm is doing or how many loops are done. Moreover, if additional.condition is TRUE, the warning message will be reported.

Value

returns a list object which contains data.frame objects for the evaluated criterion corresponding to each hyperparameter, selected hyperparameters based on the designated criterion, and an icp.torus object based the selected hyperparameters.

References

S. Jung, K. Park, and B. Kim (2021), "Clustering on the torus by conformal prediction", Akaike (1974), "A new look at the statistical model identification", Schwarz, Gideon E. (1978), "Estimating the dimension of a model"

Examples


data <- toydata2[, 1:2]
n <- nrow(data)
split.id <- rep(2, n)
split.id[sample(n, floor(n/2))] <- 1
Jvec <- 3:35
icp.torus.objects <- list()
for (j in Jvec){
  icp.torus.objects[[j]] <- icp.torus.score(data, split.id = split.id, method = "kmeans",
                                            kmeansfitmethod = "ge", init = "h",
                                            param = list(J = j), verbose = TRUE)
}
hyperparam.torus(data, icp.torus.objects, option = "risk")


[Package ClusTorus version 0.1.3 Index]