clugen {clugenr} | R Documentation |
Generate multidimensional clusters
Description
This is the main function of clugenr, and possibly the only function most users will need.
Usage
clugen(
num_dims,
num_clusters,
num_points,
direction,
angle_disp,
cluster_sep,
llength,
llength_disp,
lateral_disp,
allow_empty = FALSE,
cluster_offset = NA,
proj_dist_fn = "norm",
point_dist_fn = "n-1",
clusizes_fn = clusizes,
clucenters_fn = clucenters,
llengths_fn = llengths,
angle_deltas_fn = angle_deltas,
seed = NA
)
Arguments
num_dims |
Number of dimensions. |
num_clusters |
Number of clusters to generate. |
num_points |
Total number of points to generate. |
direction |
Average direction of the cluster-supporting lines. Can be
a vector of length |
angle_disp |
Angle dispersion of cluster-supporting lines (radians). |
cluster_sep |
Average cluster separation in each dimension (vector of
length |
llength |
Average length of cluster-supporting lines. |
llength_disp |
Length dispersion of cluster-supporting lines. |
lateral_disp |
Cluster lateral dispersion, i.e., dispersion of points from their projection on the cluster-supporting line. |
allow_empty |
Allow empty clusters? |
cluster_offset |
Offset to add to all cluster centers (vector of length
|
proj_dist_fn |
Distribution of point projections along cluster-supporting lines, with three possible values:
|
point_dist_fn |
Controls how the final points are created from their projections on the cluster-supporting lines, with three possible values:
|
clusizes_fn |
Distribution of cluster sizes. By default, cluster sizes
are determined by the clusizes function, which uses the normal distribution
(\(\mu=\) |
clucenters_fn |
Distribution of cluster centers. By default, cluster
centers are determined by the clucenters function, which uses the uniform
distribution, and takes into account the |
llengths_fn |
Distribution of line lengths. By default, the lengths of
cluster-supporting lines are determined by the llengths function, which
uses the folded normal distribution (\(\mu=\) |
angle_deltas_fn |
Distribution of line angle differences with respect to
|
seed |
An integer used to initialize the PRNG, allowing for reproducible
results. If specified, |
Details
If a custom function was given in the clusizes_fn
parameter, it is
possible that num_points
may have a different value than what was
specified in the num_points
parameter.
The terms "average" and "dispersion" refer to measures of central tendency and statistical dispersion, respectively. Their exact meaning depends on the optional arguments.
Value
A named list with the following elements:
-
points
: Anum_points
xnum_dims
matrix with the generated points for all clusters. -
clusters
: Anum_points
factor vector indicating which cluster each point inpoints
belongs to. -
projections
: Anum_points
xnum_dims
matrix with the point projections on the cluster-supporting lines. -
sizes
: Anum_clusters
x 1 vector with the number of points in each cluster. -
centers
: Anum_clusters
xnum_dims
matrix with the coordinates of the cluster centers. -
directions
: Anum_clusters
xnum_dims
matrix with the final direction of each cluster-supporting line. -
angles
: Anum_clusters
x 1 vector with the angles between the cluster-supporting lines and the main direction. -
lengths
: Anum_clusters
x 1 vector with the lengths of the cluster-supporting lines.
Note
This function is stochastic. For reproducibility set a PRNG seed with set.seed.
Examples
# 2D example
x <- clugen(2, 5, 1000, c(1, 3), 0.5, c(10, 10), 8, 1.5, 2)
graphics::plot(x$points, col = x$clusters, xlab = "x", ylab = "y", asp = 1)
# 3D example
x <- clugen(3, 5, 1000, c(2, 3, 4), 0.5, c(15, 13, 14), 7, 1, 2)