clust_all_levels {GSSTDA} | R Documentation |
Get clusters for all data level
Description
It performs the clustering of the samples in each of the levels. That is to say, in each interval of values of the filtering function, the samples with a value within that interval are clustered using the proposed clustering algorithm and the proposed method to determine the optimal number of clusters.
Usage
clust_all_levels(
data,
samp_in_lev,
distance_type,
clustering_type,
linkage_type,
optimal_clustering_mode,
silhouette_threshold,
num_bins_when_clustering
)
Arguments
data |
Input data matrix whose columns are the individuals and rows are the features.BR cambiar nombre. |
samp_in_lev |
A list of character vectors with the individuals
included in each of the levels (i.e. each of the intervals of the values
of the filter functions). It is the output of the |
distance_type |
Type of distance to be used for clustering. Choose between correlation ("correlation") and euclidean ("euclidean"). |
clustering_type |
Type of clustering method. Choose between "hierarchical" and "PAM" (“partition around medoids”) options. |
linkage_type |
Linkage criteria used in hierarchical clustering. Choose between "single" for single-linkage clustering, "complete" for complete-linkage clustering or "average" for average linkage clustering (or UPGMA). Only necessary for hierarchical clustering. |
optimal_clustering_mode |
Method for selection optimal number of clusters. It is only necessary if the chosen type of algorithm is hierarchical. In this case, choose between "standard" (the method used in the original mapper article) or "silhouette". In the case of the PAM algorithm, the method will always be "silhouette". "silhouette". |
silhouette_threshold |
Minimum value of |
num_bins_when_clustering |
Number of bins to generate the histogram employed by the standard optimal number of cluster finder method. Parameter not necessary if the "optimal_clust_mode" option is "silhouette" or the "clust_type" is "PAM". |
Value
List of interger vectors. Each of the vectors contains information about the nodes at each level and the individuals contained in them. The names of the vector values are the names of the samples and the vector values are the node number of that level to which the individual belongs.