R: Performs hierarchical clustering for functional data with...

fdahclust {fdacluster}

R Documentation

Performs hierarchical clustering for functional data with amplitude and phase separation

Description

This function extends hierarchical agglomerative clustering to functional data. It includes the possibility to separate amplitude and phase information.

Usage

fdahclust(
  x,
  y = NULL,
  n_clusters = 1L,
  warping_class = c("affine", "dilation", "none", "shift", "srsf"),
  centroid_type = "mean",
  metric = c("l2", "pearson"),
  linkage_criterion = c("complete", "average", "single", "ward.D2"),
  cluster_on_phase = FALSE,
  use_verbose = TRUE,
  warping_options = c(0.15, 0.15),
  maximum_number_of_iterations = 100L,
  number_of_threads = 1L,
  parallel_method = 0L,
  distance_relative_tolerance = 0.001,
  use_fence = FALSE,
  check_total_dissimilarity = TRUE,
  compute_overall_center = FALSE
)

Arguments

`x`	A numeric vector of length `M` or a numeric matrix of shape `N \times M` or an object of class `funData::funData`. If a numeric vector or matrix, it specifies the grid(s) of size `M` on which each of the `N` curves have been observed. If an object of class `funData::funData`, it contains the whole functional data set and the `y` argument is not used.
`y`	Either a numeric matrix of shape `N \times M` or a numeric array of shape `N \times L \times M` or an object of class `fda::fd`. If a numeric matrix or array, it specifies the `N`-sample of `L`-dimensional curves observed on grids of size `M`. If an object of class `fda::fd`, it contains all the necessary information about the functional data set to be able to evaluate it on user-defined grids.
`n_clusters`	An integer value specifying the number of clusters. Defaults to `1L`.
`warping_class`	A string specifying the warping class Choices are `"affine"`, `"dilation"`, `"none"`, `"shift"` or `"srsf"`. Defaults to `"affine"`. The SRSF class is the only class which is boundary-preserving.
`centroid_type`	A string specifying the type of centroid to compute. Choices are `"mean"`, `"median"` `"medoid"`, `"lowess"` or `"poly"`. Defaults to `"mean"`. If LOWESS appproximation is chosen, the user can append an integer between 0 and 100 as in `"lowess20"`. This number will be used as the smoother span. This gives the proportion of points in the plot which influence the smooth at each value. Larger values give more smoothness. The default value is 10%. If polynomial approximation is chosen, the user can append an positive integer as in `"poly3"`. This number will be used as the degree of the polynomial model. The default value is `4L`.
`metric`	A string specifying the metric used to compare curves. Choices are `"l2"` or `"pearson"`. Defaults to `"l2"`. Used only when `warping_class != "srsf"`. For the boundary-preserving warping class, the L2 distance between the SRSFs of the original curves is used.
`linkage_criterion`	A string specifying which linkage criterion should be used to compute distances between sets of curves. Choices are `"complete"` for complete linkage, `"average"` for average linkage and `"single"` for single linkage. See `stats::hclust()` for more details. Defaults to `"complete"`.
`cluster_on_phase`	A boolean specifying whether clustering should be based on phase variation or amplitude variation. Defaults to `FALSE` which implies amplitude variation.
`use_verbose`	A boolean specifying whether the algorithm should output details of the steps to the console. Defaults to `TRUE`.
`warping_options`	A numeric vector supplied as a helper to the chosen `warping_class` to decide on warping parameter bounds. This is used only when `warping_class != "srsf"`.
`maximum_number_of_iterations`	An integer specifying the maximum number of iterations before the algorithm stops if no other convergence criterion was met. Defaults to `100L`.
`number_of_threads`	An integer value specifying the number of threads used for parallelization. Defaults to `1L`. This is used only when `warping_class != "srsf"`.
`parallel_method`	An integer value specifying the type of desired parallelization for template computation, If `0L`, templates are computed in parallel. If `1L`, parallelization occurs within a single template computation (only for the medoid method as of now). Defaults to `0L`. This is used only when `warping_class != "srsf"`.
`distance_relative_tolerance`	A numeric value specifying a relative tolerance on the distance update between two iterations. If all observations have not sufficiently improved in that sense, the algorithm stops. Defaults to `1e-3`. This is used only when `warping_class != "srsf"`.
`use_fence`	A boolean specifying whether the fence algorithm should be used to robustify the algorithm against outliers. Defaults to `FALSE`. This is used only when `warping_class != "srsf"`.
`check_total_dissimilarity`	A boolean specifying whether an additional stopping criterion based on improvement of the total dissimilarity should be used. Defaults to `TRUE`. This is used only when `warping_class != "srsf"`.
`compute_overall_center`	A boolean specifying whether the overall center should be also computed. Defaults to `FALSE`. This is used only when `warping_class != "srsf"`.

Details

The number of clusters is required as input because, with functional data, once hierarchical clustering is performed, curves within clusters need to be aligned to their corresponding centroid.

Value

An object of class caps.

Examples

#----------------------------------
# Extracts 15 out of the 30 simulated curves in `simulated30_sub` data set
idx <- c(1:5, 11:15, 21:25)
x <- simulated30_sub$x[idx, ]
y <- simulated30_sub$y[idx, , ]

#----------------------------------
# Runs an HAC with affine alignment, searching for 2 clusters
out <- fdahclust(
  x = x,
  y = y,
  n_clusters = 2,
  warping_class = "affine"
)

#----------------------------------
# Then visualize the results
# Either with ggplot2 via ggplot2::autoplot(out)
# or using graphics::plot()
# You can visualize the original and aligned curves with:
plot(out, type = "amplitude")
# Or the estimated warping functions with:
plot(out, type = "phase")

[Package fdacluster version 0.3.0 Index]