cluster_events {beadplexr} | R Documentation |
Clustering with trimming
Description
Cluster identification with various algorithms and subsequent trimming of each cluster
Usage
bp_kmeans(df, .parameter, .column_name, .k, .trim = 0, .data = NULL, ...)
bp_clara(df, .parameter, .column_name, .k, .trim = 0, .data = NULL, ...)
bp_dbscan(
df,
.parameter,
.column_name,
.eps = 0.2,
.MinPts = 50,
.data = NULL,
...
)
bp_mclust(
df,
.parameter,
.column_name,
.k,
.trim = 0,
.sample_frac = 0.05,
.max_subset = 500,
.data = NULL,
...
)
bp_density_cut(df, .parameter, .column_name, .k, .trim = 0, .data = NULL, ...)
Arguments
df |
A tidy data.frame. |
.parameter |
A character giving the name of column(s) where populations are identified. |
.column_name |
A character giving the name of the column to store the population information. |
.k |
Numeric giving the number of expected clusters, or a set of initial cluster centers. |
.trim |
A numeric between 0 and 1, giving the fraction of points to remove by marking them NA. |
.data |
Deprecated. Use |
... |
Additional arguments passed to appropriate methods, see below. |
.eps |
Reachability distance, see |
.MinPts |
Reachability minimum no. of points, see |
.sample_frac |
A numeric between 0 and 1 giving the fraction of points
to use in initialisation of |
.max_subset |
A numeric giving the maximum of events to use in
initialisation of |
Value
The data.frame in df
with the cluster classification added in
the column given by .column_name
.
Additional parameters
Information on additional arguments passed, can be found here:
- clara
- kmeans
- dbscan
- mclust
- density_cut
Default parameters to clara()
cluster::clara()
is by default called with the following parameters:
- samples
100
- pamLike
TRUE
Parameters to dbscan
It requires some trial and error to get the right parameters for the density based clustering, but the parameters usually stay stable throughout an entire experiment and over time (assuming that there is only little drift in the flow cytometer). There is no guarantee that the correct number of clusters are returned, and it might be better to use this on the forward - side scatter discrimination.
Scaling of the parameters seems to be appropriate in most cases for the forward - side scatter discrimination and is automatically performed.
Parameters to mclust
Mclust is is slow and memory hungry on large datasets. Using a subset of the data to initialise the clustering greatly improves the speed. I have found that a subset sample of 500 even works well and gives no markedly better clustering than a subset of 5000 events, but initialisation with 500 makes the clustering complete about 12 times faster than with 5000 events.
Parameters to density_cut
This simple function works by smoothing a density function until the desired number of clusters are found. The segregation of the clusters follows at the lowest point between two clusters.
See Also
trim_population()
, identify_analyte()
.
Mclust and dbscan seems to do an excellent job at separating on the forward and side scatter parameters. Mclust and clara both perform well separating beads in the APC channel, but clara is about 3 times faster than Mclust.
Examples
library(beadplexr)
library(dplyr)
library(ggplot2)
data("lplex")
lplex[[1]] |>
# Speed things up a bit by selecting one fourth of the events.
# Probably not something you'd usually do
dplyr::sample_frac(0.25) |>
bp_kmeans(.parameter = c("FSC-A", "SSC-A"),
.column_name = "population", .trim = 0.1, .k = 2) |>
ggplot() +
aes(x = `FSC-A`, y = `SSC-A`, colour = population) +
geom_point()
library(beadplexr)
library(dplyr)
library(ggplot2)
data("lplex")
lplex[[1]] |>
# Speed things up a bit by selecting one fourth of the events.
# Probably not something you'd usually do
dplyr::sample_frac(0.25) |>
bp_clara(.parameter = c("FSC-A", "SSC-A"),
.column_name = "population", .trim = 0.1, .k = 2) |>
ggplot() +
aes(x = `FSC-A`, y = `SSC-A`, colour = population) +
geom_point()
lplex[[1]] |>
# Speed things up a bit by selecting one fourth of the events.
# Probably not something you'd usually do
dplyr::sample_frac(0.25) |>
bp_clara(.parameter = c("FSC-A", "SSC-A"),
.column_name = "population", .trim = 0, .k = 2) |>
ggplot() +
aes(x = `FSC-A`, y = `SSC-A`, colour = population) +
geom_point()
## Not run:
library(beadplexr)
library(dplyr)
library(ggplot2)
data("lplex")
lplex[[1]] |>
# Speed things up a bit by selecting one fourth of the events.
# Probably not something you'd usually do
dplyr::sample_frac(0.25) |>
bp_dbscan(.parameter = c("FSC-A", "SSC-A"), .column_name = "population",
eps = 0.2, MinPts = 50) |>
ggplot() +
aes(x = `FSC-A`, y = `SSC-A`, colour = population) +
geom_point()
pop1 <- lplex[[1]] |>
# Speed things up a bit by selecting one fourth of the events.
# Probably not something you'd usually do
dplyr::sample_frac(0.25) |>
bp_dbscan(.parameter = c("FSC-A", "SSC-A"), .column_name = "population",
eps = 0.2, MinPts = 50) |>
dplyr::filter(population == "1")
pop1 |>
bp_dbscan(.parameter = c("FL6-H", "FL2-H"), .column_name = "population",
eps = 0.2, MinPts = 50) |>
pull(population) |>
unique()
pop1 |>
bp_dbscan(.parameter = c("FL6-H", "FL2-H"), .column_name = "population",
eps = 0.2, MinPts = 50, scale = FALSE) |>
pull(population) |>
unique()
## End(Not run)
library(beadplexr)
library(ggplot2)
data("lplex")
lplex[[1]] |>
bp_mclust(.parameter = c("FSC-A", "SSC-A"),
.column_name = "population", .trim = 0, .k = 2) |>
ggplot() +
aes(x = `FSC-A`, y = `SSC-A`, colour = population) +
geom_point()
library(beadplexr)
library(ggplot2)
data("lplex")
lplex[[1]] |>
bp_density_cut(.parameter = c("FSC-A"),
.column_name = "population", .trim = 0, .k = 2) |>
ggplot() +
aes(x = `FSC-A`, y = `SSC-A`, colour = population) +
geom_point()