cv.test {monoClust} | R Documentation |
Cross-Validation Test on MonoClust
Description
Perform cross-validation test for different different number of clusters of Monothetic Clustering.
Usage
cv.test(data, fold = 10L, minnodes = 2L, maxnodes = 10L, ncores = 1L, ...)
Arguments
data |
Data set to be partitioned. |
fold |
Number of folds (k). |
minnodes |
Minimum number of clusters to be checked. |
maxnodes |
Maximum number of clusters to be checked. |
ncores |
Number of CPU cores on the current host. When set to NULL, all available cores are used. |
... |
Other parameters transferred to |
Details
The k
-fold cross-validation randomly partitions data into k
subsets with equal (or close to equal) sizes. k - 1
subsets are used as
the training data set to create a tree with a desired number of leaves and
the other subset is used as validation data set to evaluate the predictive
performance of the trained tree. The process repeats for each subset as the
validating set (m = 1, \ldots, k
) and the mean squared difference,
MSE_m=\frac{1}{n_m} \sum_{q=1}^Q\sum_{i \in m} d^2_{euc}(y_{iq},
\hat{y}_{(-i)q}),
is calculated, where \hat{y}_{(-i)q}
is the cluster mean on the
variable
q
of the cluster created by the training data where the observed value,
y_{iq}
, of the validation data set will fall into, and
d^2_{euc}(y_{iq}, \hat{y}_{(-i)q})
is the squared Euclidean distance
(dissimilarity) between two observations at variable $q$. This process is
repeated for the $k$ subsets of the data set and the average of these test
errors is the cross-validation-based estimate of the mean squared error of
predicting a new observation,
CV_K = \overline{MSE} = \frac{1}{M} \sum_{m=1}^M MSE_m.
Value
A MonoClust.cv
class containing a data frame of mean sum of square
error and its standard deviation.
Note
This function supports parallel processing with foreach::foreach()
.
It distributes MonoClust calls to processes.
See Also
plot.cv.MonoClust()
, MonoClust()
, predict.MonoClust()
Examples
library(cluster)
data(ruspini)
# Leave-one-out cross-validation
cv.test(ruspini, fold = 1, minnodes = 2, maxnodes = 4)
# 5-fold cross-validation
cv.test(ruspini, fold = 5, minnodes = 2, maxnodes = 4)