R: Clustering using Epigraph and Hypograph indices

EHyClus {ehymet}

R Documentation

Clustering using Epigraph and Hypograph indices

Description

It creates a multivariate dataset containing the epigraph, hypograph and/or its modified versions on the curves and derivatives and then perform hierarchical clustering, kmeans, kernel kmeans, and spectral clustering

Usage

EHyClus(
  curves,
  vars_combinations,
  k = 30,
  n_clusters = 2,
  bs = "cr",
  clustering_methods = c("hierarch", "kmeans", "kkmeans", "spc"),
  l_method_hierarch = c("single", "complete", "average", "centroid", "ward.D2"),
  l_dist_hierarch = c("euclidean", "manhattan"),
  l_dist_kmeans = c("euclidean", "mahalanobis"),
  l_kernel = c("rbfdot", "polydot"),
  grid,
  true_labels = NULL,
  only_best = FALSE,
  verbose = FALSE,
  n_cores = 1
)

Arguments

`curves`	Dataset containing the curves to apply a clustering algorithm. The functional dataset can be one dimensional (`n \times p`) where `n` is the number of curves and `p` the number of time points, or multidimensional (`n \times p \times q`) where `q` represents the number of dimensions in the data
`vars_combinations`	If `list`, each element of the list should be an atomic `vector` of strings with the names of the variables. Combinations with non-valid variable names will be discarded. If the list is non-named, the names of the variables are set to vars1, ..., varsk, where k is the number of elements in `vars_combinations`. If not provided, generic combinations of variables will be used. They will not be the same for uni-dimensional and multi-dimensional problems.
`k`	Number of basis functions for the B-splines. If equals to `0`, the number of basis functions will be automatically selected.
`n_clusters`	Number of clusters to generate.
`bs`	A two letter character string indicating the (penalized) smoothing basis to use. See `smooth.terms`.
`clustering_methods`	character vector specifying at least one of the following clustering methods to be computed: "hierarch", "kmeans", "kkmeans" or "spc".
`l_method_hierarch`	`list` of clustering methods for hierarchical clustering.
`l_dist_hierarch`	`list` of distances for hierarchical clustering.
`l_dist_kmeans`	`list` of distances for kmeans clustering.
`l_kernel`	`list` of kernels for kkmeans or spc.
`grid`	Atomic vector of type numeric with two elements: the lower limit and the upper limit of the evaluation grid. If not provided, it will be selected automatically.
`true_labels`	Numeric vector of true labels for validation. If provided, evaluation metrics are computed in the final result.
`only_best`	`logical` value. If `TRUE` and `true_labels` is provided, the function will return only the result for the best clustering method based on the Rand Index. Defaults to `FALSE`.
`verbose`	If `TRUE`, the function will print logs for about the execution of some clustering methods. Defaults to `FALSE`.
`n_cores`	Number of cores to do parallel computation. 1 by default, which mean no parallel execution. Must be an integer number greater than 1.

Value

A list containing the clustering partition for each method and indices combination and, if true_labels is provided a data frame containing the time elapsed for obtaining a clustering partition of the indices dataset for each methodology. Also, the number of generated clusters and the combinations of variables used can be seen as attributes of this object.

Examples

# univarariate data without labels
curves <- sim_model_ex1(n = 10)
vars_combinations <- list(c("dtaEI", "dtaMEI"), c("dtaHI", "dtaMHI"))
EHyClus(curves, vars_combinations = vars_combinations)

# multivariate data with labels
curves <- sim_model_ex2(n = 5)
true_labels <- c(rep(1, 5), rep(2, 5))
vars_combinations <- list(c("dtaMEI", "ddtaMEI"), c("dtaMEI", "d2dtaMEI"))
res <- EHyClus(curves, vars_combinations = vars_combinations, true_labels = true_labels)
res$cluster # clustering results

# multivariate data and generic (default) vars_combinations
curves <- sim_model_ex2(n = 5)
EHyClus(curves)

[Package ehymet version 0.1.0 Index]