R: Mapper object

mapper {GSSTDA}

R Documentation

Mapper object

Description

TDA are persistent homology and mapper. Persistent homology borrows ideas from abstract algebra to identify particular aspects related to the shape of the data such as the number of connected components and the presence of higher-dimensional holes, whereas mapper condenses the information of high-dimensional datasets into a combinatory graph or simplicial complex that is referred to as the skeleton of the dataset. This implementation is the mapper of one dimension, i.e. using only one filter function value.

Usage

mapper(
  data,
  filter_values,
  num_intervals = 5,
  percent_overlap = 40,
  distance_type = "correlation",
  clustering_type = "hierarchical",
  num_bins_when_clustering = 10,
  linkage_type = "single",
  optimal_clustering_mode = NA,
  silhouette_threshold = 0.25,
  na.rm = TRUE
)

Arguments

`data`	Input matrix whose columns correspond to the individuals and rows to the features.
`filter_values`	Vector obtained after applying the filtering function to the input matrix, i.e, a vector with the filtering function values for each included sample.
`num_intervals`	Number of intervals used to create the first sample partition based on filtering values. 5 default option.
`percent_overlap`	Percentage of overlap between intervals. Expressed as a percentage. 40 default option.
`distance_type`	Type of distance to be used for clustering. Choose between correlation ("correlation") and euclidean ("euclidean"). "correlation" default option.
`clustering_type`	Type of clustering method. Choose between "hierarchical" and "PAM" (“partition around medoids”) options. "hierarchical" default option.
`num_bins_when_clustering`	Number of bins to generate the histogram employed by the standard optimal number of cluster finder method. Parameter not necessary if the "optimal_clustering_mode" option is "silhouette" or the "clustering_type" is "PAM". 10 default option.
`linkage_type`	Linkage criteria used in hierarchical clustering. Choose between "single" for single-linkage clustering, "complete" for complete-linkage clustering or "average" for average linkage clustering (or UPGMA). Only necessary for hierarchical clustering. "single" default option.
`optimal_clustering_mode`	Method for selection optimal number of clusters. It is only necessary if the chosen type of algorithm is hierarchical. In this case, choose between "standard" (the method used in the original mapper article) or "silhouette". In the case of the PAM algorithm, the method will always be "silhouette".
`silhouette_threshold`	Minimum value of `\overline{s}` that a set of clusters must have to be chosen as optimal. Within each interval of the filter function, the average silhouette values `\overline{s}` are computed for all possible partitions from $2$ to $n-1$, where $n$ is the number of samples within a specific interval. The $n$ that produces the highest value of `\overline{s}` and that exceeds a specific threshold is selected as the optimum number of clusters. If no partition produces an `\overline{s}` exceeding the chosen threshold, all samples are then assigned to a unique cluster. The default value is $0.25$. The threshold of $0.25$ for `\overline{s}` has been chosen based on standard practice, recognizing it as a moderate value that reflects adequate separation and cohesion within clusters.
`na.rm`	`logical`. If `TRUE`, `NA` rows are omitted. If `FALSE`, an error occurs in case of `NA` rows. TRUE default option.

Value

A mapper_obj object. It contains the values of the intervals (interval_data), the samples included in each interval (sample_in_level), information about the cluster to which the individuals in each interval belong (clustering_all_levels), a list including the individuals contained in each detected node (node_samples), their size (node_sizes), the average of the filter function values of the individuals of each node (node_average_filt) and the adjacency matrix linking the nodes (adj_matrix).

Examples


control_tag_cases <- which(case_tag == "NT")
gene_selection_object <- gene_selection_(full_data, survival_time, survival_event,
control_tag_cases, gen_select_type ="top_bot", num_gen_select = 10)

mapper_object <- mapper(data = gene_selection_object[["genes_disease_component"]],
filter_values = gene_selection_object[["filter_values"]],
num_intervals = 5,
percent_overlap = 40, distance_type = "correlation",
clustering_type = "hierarchical",
linkage_type = "single")

[Package GSSTDA version 1.0.0 Index]