R: Calculate Overlap and Similarity Coefficients between Feature...

calculate_overlap_coefficients {GeneSelectR}

R Documentation

Calculate Overlap and Similarity Coefficients between Feature Lists

Description

This function calculates the Overlap, Jaccard, and Soerensen-Dice coefficients to quantify the similarity between feature lists. In addition to feature importance and permutation importance, you can provide a custom list of feature names to be included in the overlap calculation.

Usage

calculate_overlap_coefficients(pipeline_results, custom_lists = NULL)

Arguments

`pipeline_results`	A PipelineResults object containing the fitted pipelines, cross-validation results, selected features, mean performance, and mean feature importances.
`custom_lists`	An optional named list of character vectors. Each character vector should contain feature names. The names of the list will be used as names in the resulting overlap coefficient matrices.

Value

A list containing lists of matrices, where each list corresponds to a different type of feature list (inbuilt feature importance, permutation importance, and custom lists if provided). Within each of these lists, there are three matrices showing the Overlap, Jaccard, and Soerensen-Dice coefficients for the feature lists: - @field overlap: A matrix showing the Overlap coefficients. - @field jaccard: A matrix showing the Jaccard coefficients. - @field soerensen: A matrix showing the Soerensen-Dice coefficients. These matrices compare the feature lists against each other, providing a numerical measure of their similarity. Note: If permutation importance data is not present in the pipeline_results, the corresponding list entry will be absent.

Examples

# Basic Usage with Mock Data
# Create a mock PipelineResults object with minimal data
mock_pipeline_results <- new("PipelineResults",
                             inbuilt_feature_importance = list(
                             "FeatureSet1" = data.frame(feature = c("feature1", "feature2")),
                             "FeatureSet2" = data.frame(feature = c("feature2", "feature3"))),
                             permutation_importance = list(
                             "FeatureSet1" = data.frame(feature = c("feature3", "feature4")),
                             "FeatureSet2" = data.frame(feature = c("feature1", "feature4"))))

# Calculate overlap coefficients without custom lists
overlap_results <- calculate_overlap_coefficients(mock_pipeline_results)


# Including Custom Lists
# Create custom feature lists
custom_feature_lists <- list("CustomList1" = c("feature5", "feature6"),
                             "CustomList2" = c("feature6", "feature7"))

# Calculate overlap coefficients with custom lists
overlap_results_custom <- calculate_overlap_coefficients(mock_pipeline_results,
                                                         custom_feature_lists)

[Package GeneSelectR version 1.0.1 Index]