DependencyBasedStrategy {D2MCS} | R Documentation |
Clustering strategy based on dependency between features.
Description
Features are distributed according to their independence values. This strategy is divided into two steps. The first phase focuses on forming groups with those features most dependent on each other. This step also identifies those that are independent from all the others in the group. The second step is to try out different numbers of clusters until you find the one you think is best. These clusters are formed by inserting in all the independent characteristics identified previously and trying to distribute the features of the groups formed in the previous step in separate clusters. In this way, it seeks to ensure that the features are as independent as possible from those found in the same cluster.
Details
The strategy is suitable only for binary and real features. Other
features are automatically grouped into a specific cluster named as
'unclustered'. This class requires the StrategyConfiguration
type object implements the following methods:
- getBinaryCutoff()
: The function is used to define the interval to
consider the dependency between binary features.
- getRealCutoff()
: The function allows defining the cutoff to consider
the dependency between real features.
- tiebreak(feature, clus.candidates, fea.dep.dist.clus, corpus,
heuristic, class, class.name)
: The function solves the ties between two
(or more) features.
- qualityOfCluster(clusters, metrics)
: The function determines the
quality of a cluster
- isImprovingClustering(clusters.deltha)
: The function indicates if
clustering is getting better as the number of them increases.
An example of implementation with the description of each parameter is the
DependencyBasedStrategyConfiguration
class.
Super class
D2MCS::GenericClusteringStrategy
-> DependencyBasedStrategy
Methods
Public methods
Inherited methods
Method new()
Method for initializing the object parameters during runtime.
Usage
DependencyBasedStrategy$new( subset, heuristic, configuration = DependencyBasedStrategyConfiguration$new() )
Arguments
subset
The
Subset
used to apply the feature-clustering strategy.heuristic
The heuristic used to compute the relevance of each feature. Must inherit from
GenericHeuristic
abstract class.configuration
optional parameter to customize configuration parameters for the strategy. Must inherited from
StrategyConfiguration
abstract class.
Method execute()
Function responsible of performing the dependency-based
feature clustering strategy over the defined Subset
.
Usage
DependencyBasedStrategy$execute(verbose = TRUE)
Arguments
verbose
A logical value to specify if more verbosity is needed.
Method getDistribution()
Function used to obtain a specific cluster distribution.
Usage
DependencyBasedStrategy$getDistribution( num.clusters = NULL, num.groups = NULL, include.unclustered = FALSE )
Arguments
Returns
A list with the features comprising an specific clustering distribution.
Method createTrain()
The function is used to create a Trainset
object from a specific clustering distribution.
Usage
DependencyBasedStrategy$createTrain( subset, num.clusters = NULL, num.groups = NULL, include.unclustered = FALSE )
Arguments
subset
The
Subset
object used as a basis to create the train set (seeTrainset
class).num.clusters
A numeric value to select the number of clusters (define the distribution).
num.groups
A single or numeric vector value to identify a specific group that forms the clustering distribution.
include.unclustered
A logical value to determine if unclustered features should be included.
Details
If num.clusters
and num.groups
are not defined,
best clustering distribution is used to create the train set.
Method plot()
The function is responsible for creating a plot to visualize the clustering distribution.
Usage
DependencyBasedStrategy$plot(dir.path = NULL, file.name = NULL)
Arguments
dir.path
An optional argument to define the name of the directory where the exported plot will be saved. If not defined, the file path will be automatically assigned to the current working directory, '
getwd()
'.file.name
A character to define the name of the PDF file where the plot is exported.
Method saveCSV()
The function is used to save the clustering distribution to a CSV file.
Usage
DependencyBasedStrategy$saveCSV( dir.path = NULL, name = NULL, num.clusters = NULL )
Arguments
dir.path
The name of the directory to save the CSV file.
name
Defines the name of the CSV file.
num.clusters
An optional parameter to select the number of clusters to be saved. If not defined, all cluster distributions will be saved.
Method clone()
The objects of this class are cloneable with this method.
Usage
DependencyBasedStrategy$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
GenericClusteringStrategy
,
StrategyConfiguration
,
DependencyBasedStrategyConfiguration