analyze_representatives {TDApplied} | R Documentation |
Analyze the data point memberships of multiple representative (co)cycles.
Description
Multiple distance matrices with corresponding data points can contain the same topological features. Therefore we may wish to compare many representative (co)cycles across distance matrices to decide if their topological features are the same. The 'analyze_representatives' function returns a matrix of binary datapoint memberships in an input list of representatives across distance matrices. Optionally this matrix can be plotted as a heatmap with columns as data points and rows (i.e. representatives) reordered by similarity, and the contributions (i.e. percentage membership) of each point in the representatives can also be returned. The heatmap has dark red squares representing membership - location [i,j] is dark red if data point j is in representative i.
Usage
analyze_representatives(
diagrams,
dim,
num_points,
plot_heatmap = TRUE,
return_contributions = FALSE,
boxed_reps = NULL,
d = NULL,
lwd = NULL,
title = NULL,
return_clust = FALSE
)
Arguments
diagrams |
a list of persistence diagrams, either the output of persistent homology calculations like ripsDiag/ |
dim |
the integer homological dimension of representatives to consider. |
num_points |
the integer number of data points in all the original datasets (from which the diagrams were calculated). |
plot_heatmap |
a boolean representing if a heatmap of data point membership similarity of the representatives should be plotted, default 'TRUE'. A dendrogram of hierarchical clustering is plotted, and rows (representatives) are sorted according to this clustering. |
return_contributions |
a boolean indicating whether or not to return the membership contributions (i.e. percentages) of the data points (1:'num_points') across all the representatives, default 'FALSE'. |
boxed_reps |
a data frame specifying specific rows of the output heatmap which should have a box drawn around them (for highlighting), default NULL. See the details section for more information. |
d |
either NULL (default) or a "dist" object representing a distance matrix for the representatives, which must have the same number of rows and columns as cycles in the dimension 'dim'. |
lwd |
a positive number width for the lines of drawn boxes, if boxed_reps is not null. |
title |
a character string title for the plotted heatmap, default NULL. |
return_clust |
a boolean determining whether or not to return the result of the 'stats::hclust()' call when a heatmap is plotted, default 'FALSE'. |
Details
The clustering dendrogram can be used to determine if there are any similar groups of representatives (i.e.
shared topological features across datasets) and if so how many. The row labels of the heatmap are of the form
'DX[Y]', meaning the Yth representative of diagram X, and the column labels are the data point numbers.
If diagrams are the output of the bootstrap_persistence_thresholds
function, then the subsetted_representatives (if present) will be analyzed. Therefore, a column label like 'DX[Y]' in the
plotted heatmap would mean the Yth representative of diagram X. If certain representatives should be highlighted (by drawing a box around its row)
in the heatmap, a dataframe ‘boxed_reps' can be supplied with two integer columns - ’diagram' and 'rep'. For example, if we wish to draw a box for DX[Y] then we
add the row (diagram = X,rep = Y) to 'boxed_reps'. If 'd' is supplied then it will be used to cluster the representatives, based on the distances in 'd'.
Value
either a matrix of data point contributions to the representatives, or a list with elements "memberships" (the matrix) and some combination of elements "contributions" (a vector of membership percentages for each data point across representatives) and "clust" (the results of 'stats::hclust()' on the membership matrix).
Author(s)
Shael Brown - shaelebrown@gmail.com