EHyClus {ehymet}R Documentation

Clustering using Epigraph and Hypograph indices

Description

It creates a multivariate dataset containing the epigraph, hypograph and/or its modified versions on the curves and derivatives and then perform hierarchical clustering, kmeans, kernel kmeans, and spectral clustering

Usage

EHyClus(
  curves,
  vars_combinations,
  k = 30,
  n_clusters = 2,
  bs = "cr",
  clustering_methods = c("hierarch", "kmeans", "kkmeans", "spc"),
  l_method_hierarch = c("single", "complete", "average", "centroid", "ward.D2"),
  l_dist_hierarch = c("euclidean", "manhattan"),
  l_dist_kmeans = c("euclidean", "mahalanobis"),
  l_kernel = c("rbfdot", "polydot"),
  grid,
  true_labels = NULL,
  only_best = FALSE,
  verbose = FALSE,
  n_cores = 1
)

Arguments

curves

Dataset containing the curves to apply a clustering algorithm. The functional dataset can be one dimensional (n \times p) where n is the number of curves and p the number of time points, or multidimensional (n \times p \times q) where q represents the number of dimensions in the data

vars_combinations

If list, each element of the list should be an atomic vector of strings with the names of the variables. Combinations with non-valid variable names will be discarded. If the list is non-named, the names of the variables are set to vars1, ..., varsk, where k is the number of elements in vars_combinations. If not provided, generic combinations of variables will be used. They will not be the same for uni-dimensional and multi-dimensional problems.

k

Number of basis functions for the B-splines. If equals to 0, the number of basis functions will be automatically selected.

n_clusters

Number of clusters to generate.

bs

A two letter character string indicating the (penalized) smoothing basis to use. See smooth.terms.

clustering_methods

character vector specifying at least one of the following clustering methods to be computed: "hierarch", "kmeans", "kkmeans" or "spc".

l_method_hierarch

list of clustering methods for hierarchical clustering.

l_dist_hierarch

list of distances for hierarchical clustering.

l_dist_kmeans

list of distances for kmeans clustering.

l_kernel

list of kernels for kkmeans or spc.

grid

Atomic vector of type numeric with two elements: the lower limit and the upper limit of the evaluation grid. If not provided, it will be selected automatically.

true_labels

Numeric vector of true labels for validation. If provided, evaluation metrics are computed in the final result.

only_best

logical value. If TRUE and true_labels is provided, the function will return only the result for the best clustering method based on the Rand Index. Defaults to FALSE.

verbose

If TRUE, the function will print logs for about the execution of some clustering methods. Defaults to FALSE.

n_cores

Number of cores to do parallel computation. 1 by default, which mean no parallel execution. Must be an integer number greater than 1.

Value

A list containing the clustering partition for each method and indices combination and, if true_labels is provided a data frame containing the time elapsed for obtaining a clustering partition of the indices dataset for each methodology. Also, the number of generated clusters and the combinations of variables used can be seen as attributes of this object.

Examples

# univarariate data without labels
curves <- sim_model_ex1(n = 10)
vars_combinations <- list(c("dtaEI", "dtaMEI"), c("dtaHI", "dtaMHI"))
EHyClus(curves, vars_combinations = vars_combinations)

# multivariate data with labels
curves <- sim_model_ex2(n = 5)
true_labels <- c(rep(1, 5), rep(2, 5))
vars_combinations <- list(c("dtaMEI", "ddtaMEI"), c("dtaMEI", "d2dtaMEI"))
res <- EHyClus(curves, vars_combinations = vars_combinations, true_labels = true_labels)
res$cluster # clustering results

# multivariate data and generic (default) vars_combinations
curves <- sim_model_ex2(n = 5)
EHyClus(curves)


[Package ehymet version 0.1.0 Index]