getClusters {Rnmr1D} | R Documentation |
getClusters
Description
From the data matrix generated from the integration of all bucket zones (columns) for each spectrum (rows), we can take advantage of the concentration variability of each compound in a series of samples by performing a clustering based on significant correlations that link these buckets together into clusters. Bucket Clustering based on either a lower threshold applied on correlations or a cutting value applied on a hierarchical tree of the variables (buckets) generated by an Hierarchical Clustering Analysis (HCA).
Usage
getClusters(data, method = "hca", ...)
Arguments
data |
the matrix including the integrations of the areas defined by the buckets (columns) on each spectrum (rows) |
method |
Clustering method of the buckets. Either 'corr' for 'correlation' or 'hca' for 'hierarchical clustering analysis'. |
... |
Depending on the chosen method:
|
Details
At the bucketing step (see above), we have chosen the intelligent bucketing, it means that each bucket exact matches with one resonance peak. Thanks to this, the buckets now have a strong chemical meaning, since the resonance peaks are the fingerprints of chemical compounds. However, to assign a chemical compound, several resonance peaks are generally required in 1D 1 H-NMR metabolic profiling. To generate relevant clusters (i.e. clusters possibly matching to chemical compounds), two approaches have been implemented:
Bucket Clustering based on a lower threshold applied on correlations
In this approach an appropriate correlation threshold is applied on the correlation matrix before its cluster decomposition. Moreover, an improvement can be done by searching for a trade-off on a tolerance interval of the correlation threshold : from a fixed threshold of the correlation (cval), the clustering is calculated for the three values (cval-dC, cval, cval+dC), where dC is the tolerance interval of the correlation threshold. From these three sets of clusters, we establish a merger according to the following rules: 1) if a large cluster is broken, we keep the two resulting clusters. 2) If a small cluster disappears, the initial cluster is conserved. Generally, an interval of the correlation threshold included between 0.002 and 0.01 gives good trade-off.
Bucket Clustering based on a hierarchical tree of the variables (buckets) generated by an Hierarchical Clustering Analysis (HCA)
In this approach a Hierachical Classification Analysis (HCA,
hclust
) is applied on the data after calculating a matrix distance ("euclidian" by default). Then, a cut is applied on the tree (cutree
) resulting fromhclust
, into several groups by specifying the cut height(s). For finding best cut value, the cut height is chosen i) by testing several values equally spaced in a given range of the cut height, then, 2) by keeping the one that gives the more cluster and by including most bucket variables. Otherwise, a cut value has to be specified by the user (vcutusr)
Value
getClusters
returns a list containing the following components:
-
vstats
Statistics that served to find the best value of the criterion (matrix) -
clusters
List of the ppm value corresponding to each cluster. the length of the list equal to number of clusters -
clustertab
the associations matrix that gives for each cluster (column 2) the corresponding buckets (column 1) -
params
List of parameters related to the chosen method for which the clustering was performed. -
vcrit
Value of the (best/user) criterion, i.e correlation threshold for 'corr' method or the cut value for the 'hca' method. -
indxopt
Index value within the vstats matrix corresponding to the criterion value (vcrit)
References
Jacob D., Deborde C. and Moing A. (2013) An efficient spectra processing method for metabolite identification from 1H-NMR metabolomics data. Analytical and Bioanalytical Chemistry 405(15) 5049-5061 doi: 10.1007/s00216-013-6852-y
Examples
data_dir <- system.file("extra", package = "Rnmr1D")
cmdfile <- file.path(data_dir, "NP_macro_cmd.txt")
samplefile <- file.path(data_dir, "Samples.txt")
out <- Rnmr1D::doProcessing(data_dir, cmdfile=cmdfile,
samplefile=samplefile, ncpu=2)
outMat <- getBucketsDataset(out, norm_meth='CSN')
clustcorr <- getClusters(outMat, method='corr', cval=0, dC=0.003, ncpu=2)
clusthca <- getClusters(outMat, method='hca', vcutusr=0)