check_arg_mapper {GSSTDA}R Documentation

check_arg_mapper

Description

Checking the arguments introduces in the mapper object.

Usage

check_arg_mapper(
  full_data,
  filter_values,
  distance_type,
  clustering_type,
  linkage_type,
  optimal_clustering_mode = NA,
  silhouette_threshold = 0.25,
  na.rm = TRUE
)

Arguments

full_data

Matrix with the columns of the input matrix corresponding to the individuals belonging to the level.

filter_values

Vector obtained after applying the filtering function to the input matrix, i.e, a vector with the filtering function values for each included sample.

distance_type

Type of distance to be used for clustering. Choose between correlation ("correlation") and euclidean ("euclidean"). "correlation" default option.

clustering_type

Type of clustering method. Choose between "hierarchical" and "PAM" (“partition around medoids”) options. "hierarchical" default option.

linkage_type

Linkage criteria used in hierarchical clustering. Choose between "single" for single-linkage clustering, "complete" for complete-linkage clustering or "average" for average linkage clustering (or UPGMA). Only necessary for hierarchical clustering. "single" default option.

optimal_clustering_mode

Method for selection optimal number of clusters. It is only necessary if the chosen type of algorithm is hierarchical. In this case, choose between "standard" (the method used in the original mapper article) or "silhouette". In the case of the PAM algorithm, the method will always be "silhouette".

silhouette_threshold

Minimum value of \overline{s} that a set of clusters must have to be chosen as optimal. Within each interval of the filter function, the average silhouette values \overline{s} are computed for all possible partitions from $2$ to $n-1$, where $n$ is the number of samples within a specific interval. The $n$ that produces the highest value of \overline{s} and that exceeds a specific threshold is selected as the optimum number of clusters. If no partition produces an \overline{s} exceeding the chosen threshold, all samples are then assigned to a unique cluster. The default value is $0.25$. The threshold of $0.25$ for \overline{s} has been chosen based on standard practice, recognizing it as a moderate value that reflects adequate separation and cohesion within clusters.

na.rm

logical. If TRUE, NA rows are omitted. If FALSE, an error occurs in case of NA rows.

Value

optimal_clustering_mode


[Package GSSTDA version 1.0.0 Index]