vscc {vscc} | R Documentation |
Variable Selection for Clustering and Classification
Description
Performs variable selection under a clustering or classification framework. Automated implementation using model-based clustering is based on teigen
version 2.0 and mclust
version 4.0; issues *may* arise when using different versions.
Usage
vscc(x, G=1:9, automate = "mclust", initial = NULL, initunc=NULL, train = NULL,
forcereduction = FALSE)
Arguments
x |
Data frame or matrix to perform variable selection on |
G |
Vector for the number of groups to consider during initialization and/or post-selection analysis. Default is 1-9. |
automate |
Character string ( |
initial |
Optional vector giving the initial clustering. |
initunc |
Optional scalar indicating the total uncertainty of the initial clustering solution. Only used when |
train |
Optional vector of training data (for classification framework). |
forcereduction |
Logical indicating if the full data set should be considered (FALSE) when selecting the ‘best’ variable subset via total model uncertainty. Not used if |
Value
selected |
A list containing the subsets of variables selected for each relation. Each set is numbered according to the number in the exponential of the relationship. For instance, |
family |
The family used as initialization and/or post selection. (Same as user input |
wss |
The within-group variance associated with each variable from the full data set. |
The remaining values are provided as long as automate
is not NULL
:
topselected |
The best variable subset according to the total model uncertainty. |
initialrun |
Results from the initialization; an object of class |
bestmodel |
Results from the best model on the selected variable subset; an object of class |
chosenrelation |
Numeric indication of the relationship chosen according to total model uncertainty. The number corresponds to exponent in the relationship: for instance, a value of '4' suggests the quartic relationship. If the value |
uncertainty |
Total model uncertainty associated with the best relationship. |
allmodelfit |
List containing the results ( |
Author(s)
Jeffrey L. Andrews, Paul D. McNicholas
References
See citation("vscc")
for the variable selection references. See also citation("teigen")
and citation("mclust")
if using those families of models via the automate
call.
See Also
Examples
require("mclust")
data(banknote)
head(banknote)
bankrun <- vscc(banknote[,-1])
head(bankrun$topselected) #Show preview of selected variables
table(banknote[,1], bankrun$initialrun$classification) #Clustering results on full data set
table(banknote[,1], bankrun$bestmodel$classification) #Clustering results on reduced data set