validClimR {HiClimR} | R Documentation |
Validation of Hierarchical Climate Regionalization
Description
validClimR
computes indices for cluster validation, and an
objective tree cut for regional
linkage clustering method.
Usage
validClimR(y = NULL, k = NULL, minSize = 1, alpha = 0.05, verbose = TRUE,
plot = FALSE, colPalette = NULL, pch = 15, cex = 1)
Arguments
y |
a dendrogram tree produced by |
k |
|
minSize |
minimum cluster size. The |
alpha |
confidence level: the default is |
verbose |
logical to print processing information if |
plot |
logical to call the plotting method if |
colPalette |
a color palette or a list of colors such as that generated
by |
pch |
Either an integer specifying a symbol or a single character to
be used as the default in plotting points. See |
cex |
A numerical value giving the amount by which plotting symbols should
be magnified relative to the |
Details
The validClimR
function is used for validation of a dendrogram tree
produced by HiClimR
, by computing detailed statistical information for
each cluster about cluster means, sizes, intra- and inter-cluster correlations,
and overall summary. It requires the preprocessed data matrix and the tree from
HiClimR
function as inputs. An optional parameter can be used to
validate clustering for a selected number of clusters k
. If k = NULL
,
the default which supports only the regional
linkage method, objective cutting
of the tree to find the optimal number of clusters will be applied based on a user
specified significance level (alpha
parameter). In regional
linkage method,
noisy spatial elements are isolated in very small-size clusters or individuals since
they do not correlate well with any other elements. They can be excluded from the
validation indices (interCor
, intraCor
, diffCor
, and statSum
),
based on minSize
minimum cluster size. The excluded clusters are identified in
the output of validClimR
in clustFlag
, which takes a value of 1
for selected clusters or 0
for excluded clusters. The sum of clustFlag
elements represents the selected number clusters.This should be followed by a quality
control step before repeating the analysis.
Value
An object of class HiClimR
which produces indices for validating
the tree produced by the clustering process.
The object is a list with the following components:
cutLevel |
the minimum significant correlation used for objective tree cut together with the corresponding confidence level. |
clustMean |
the cluster means which are the region's mean timeseries for all selected regions. |
clustSize |
cluster sizes for all selected regions. |
clustFlag |
a flag |
interCor |
inter-cluster correlations for all selected regions. It is the inter-cluster correlations between cluster means. The maximum inter-cluster correlation is a measure for separation or contiguity, and it is used for objective tree cut (to find the "optimal" number of clusters). |
intraCor |
intra-cluster correlations for all selected regions. It is the intra-cluster correlations between the mean of each cluster and its members. The average intra-cluster correlation is a weighted average for all clusters, and it is a measure for homogeneity. |
diffCor |
difference between intra-cluster correlation and maximum inter-cluster correlation for all selected regions. |
statSum |
overall statistical summary for i |
region |
ordered regions vector of size |
regionID |
ordered regions ID vector of length equals the selected number
of clusters, after excluding the small clusters defined by |
Author(s)
Hamada S. Badr <badr@jhu.edu>, Benjamin F. Zaitchik <zaitchik@jhu.edu>,
and Amin K. Dezfuli <amin.dezfuli@nasa.gov>. HiClimR
is
a modification of hclust
function, which is based on
Fortran code contributed to STATLIB by F. Murtagh.
References
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2015): A Tool for Hierarchical Climate Regionalization, Earth Science Informatics, 8(4), 949-958, doi: 10.1007/s12145-015-0221-7.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2014): Hierarchical Climate Regionalization, Comprehensive R Archive Network (CRAN), https://cran.r-project.org/package=HiClimR.
See Also
HiClimR
, HiClimR2nc
, validClimR
,
geogMask
, coarseR
, fastCor
,
grid2D
and minSigCor
.
Examples
require(HiClimR)
## Load test case data
x <- TestCase$x
## Generate longitude and latitude mesh vectors
xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat))
lon <- c(xGrid$lon)
lat <- c(xGrid$lat)
## Hierarchical Climate Regionalization
y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,
continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE,
standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE,
kH = NULL, members = NULL, validClimR = TRUE, k = 12, minSize = 1,
alpha = 0.01, plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
## Validtion of Hierarchical Climate Regionalization
z <- validClimR(y, k = 12, minSize = 1, alpha = 0.01, plot = TRUE)
## Use a specified number of clusters (k = 12)
z <- validClimR(y, k = 12, minSize = 1, alpha = 0.01, plot = TRUE)
## Apply minimum cluster size (minSize = 25)
z <- validClimR(y, k = 12, minSize = 25, alpha = 0.01, plot = TRUE)
## The optimal number of clusters, including small clusters
k <- length(z$clustFlag)
## The selected number of clusters, after excluding small clusters (if minSize > 1)
ks <- sum(z$clustFlag)