calculate_overlap_coefficients {GeneSelectR} | R Documentation |
Calculate Overlap and Similarity Coefficients between Feature Lists
Description
This function calculates the Overlap, Jaccard, and Soerensen-Dice coefficients to quantify the similarity between feature lists. In addition to feature importance and permutation importance, you can provide a custom list of feature names to be included in the overlap calculation.
Usage
calculate_overlap_coefficients(pipeline_results, custom_lists = NULL)
Arguments
pipeline_results |
A PipelineResults object containing the fitted pipelines, cross-validation results, selected features, mean performance, and mean feature importances. |
custom_lists |
An optional named list of character vectors. Each character vector should contain feature names. The names of the list will be used as names in the resulting overlap coefficient matrices. |
Value
A list containing lists of matrices, where each list corresponds to a different type of feature list (inbuilt feature importance, permutation importance, and custom lists if provided).
Within each of these lists, there are three matrices showing the Overlap, Jaccard, and Soerensen-Dice coefficients for the feature lists:
- @field overlap: A matrix showing the Overlap coefficients.
- @field jaccard: A matrix showing the Jaccard coefficients.
- @field soerensen: A matrix showing the Soerensen-Dice coefficients.
These matrices compare the feature lists against each other, providing a numerical measure of their similarity.
Note: If permutation importance data is not present in the pipeline_results
, the corresponding list entry will be absent.
Examples
# Basic Usage with Mock Data
# Create a mock PipelineResults object with minimal data
mock_pipeline_results <- new("PipelineResults",
inbuilt_feature_importance = list(
"FeatureSet1" = data.frame(feature = c("feature1", "feature2")),
"FeatureSet2" = data.frame(feature = c("feature2", "feature3"))),
permutation_importance = list(
"FeatureSet1" = data.frame(feature = c("feature3", "feature4")),
"FeatureSet2" = data.frame(feature = c("feature1", "feature4"))))
# Calculate overlap coefficients without custom lists
overlap_results <- calculate_overlap_coefficients(mock_pipeline_results)
# Including Custom Lists
# Create custom feature lists
custom_feature_lists <- list("CustomList1" = c("feature5", "feature6"),
"CustomList2" = c("feature6", "feature7"))
# Calculate overlap coefficients with custom lists
overlap_results_custom <- calculate_overlap_coefficients(mock_pipeline_results,
custom_feature_lists)