perform_clustering {clap}R Documentation

Perform clustering based on nearest neighbor distances

Description

Perform clustering based on nearest neighbor distances

Usage

perform_clustering(data, class_column = NULL)

Arguments

data

A numeric matrix or data frame of data points.

class_column

A character string or unquoted name specifying the name of the column containing class labels.

Details

This function first removes the specified class column from the data, calculates the nearest neighbor distances, and then performs clustering using a radius based on the maximum nearest neighbor distance.

Value

An object of class 'clap' containing:

members

A list of clusters with their respective data point IDs.

cluster_df

A data frame with cluster assignments for each data point.

data

The original dataset.

Examples

if (requireNamespace("ggplot2", quietly = TRUE)) {
  # Generate dummy data
  class1 <- matrix(rnorm(100, mean = 0, sd = 1), ncol = 2) +
    matrix(rep(c(1, 1), each = 50), ncol = 2)
  class2 <- matrix(rnorm(100, mean = 0, sd = 1), ncol = 2) +
    matrix(rep(c(-1, -1), each = 50), ncol = 2)
  datanew <- rbind(class1, class2)
  training <- data.frame(datanew, class = factor(c(rep(1, 50), rep(2, 50))))

  # Plot the dummy data to visualize overlaps
  p <- ggplot2::ggplot(training, ggplot2::aes(x = X1, y = X2, color = class)) +
    ggplot2::geom_point() +
    ggplot2::labs(title = "Dummy Data with Overlapping Classes")
  print(p)

  # Perform clustering
  cluster_result <- perform_clustering(training, class_column = class)
}


[Package clap version 0.1.0 Index]