dissimilarity {resemble} | R Documentation |
Dissimilarity computation between matrices
Description
This is a wrapper to integrate the different dissimilarity functions of the
offered by package.It computes the dissimilarities between observations in
numerical matrices by using an specifed dissmilarity measure.
Usage
dissimilarity(Xr, Xu = NULL,
diss_method = c("pca", "pca.nipals", "pls", "mpls",
"cor", "euclid", "cosine", "sid"),
Yr = NULL, gh = FALSE, pc_selection = list("var", 0.01),
return_projection = FALSE, ws = NULL,
center = TRUE, scale = FALSE, documentation = character(),
...)
Arguments
Xr |
a matrix of containing n observations/rows and p
variables/columns.
|
Xu |
an optional matrix containing data of a second set of observations
with p variables/columns.
|
diss_method |
a character string indicating the method to be used to
compute the dissimilarities between observations. Options are:
"pca" : Mahalanobis distance
computed on the matrix of scores of a Principal Component (PC)
projection of Xr (and Xu if provided). PC projection is
done using the singular value decomposition (SVD) algorithm.
See ortho_diss function.
"pca.nipals" : Mahalanobis distance
computed on the matrix of scores of a Principal Component (PC)
projection of Xr (and Xu if provided). PC projection is
done using the non-linear iterative partial least squares (nipals)
algorithm. See ortho_diss function.
"pls" : Mahalanobis distance
computed on the matrix of scores of a partial least squares projection
of Xr (and Xu if provided). In this case, Yr is
always required. See ortho_diss function.
"mpls" : Mahalanobis distance
computed on the matrix of scores of a modified partial least squares
projection (Shenk and Westerhaus, 1991; Westerhaus, 2014)
of Xr (and Xu if provided). In this case, Yr is
always required. See ortho_diss function.
"cor" : based on the correlation coefficient
between observations. See cor_diss function.
"euclid" : Euclidean distance
between observations. See f_diss function.
"cosine" : Cosine distance
between observations. See f_diss function.
"sid" : spectral information divergence between
observations. See sid function.
|
Yr |
a numeric matrix of n observations used as side information of
Xr for the ortho_diss methods (i.e. pca ,
pca.nipals or pls ). It is required when:
|
gh |
a logical indicating if the Mahalanobis distance (in the pls score
space) between each observation and the pls centre/mean must be
computed.
|
pc_selection |
a list of length 2 to be passed onto the
ortho_diss methods. It is required if the method selected in
diss_method is any of "pca" , "pca.nipals" or
"pls" or if gh = TRUE . This argument is used for
optimizing the number of components (principal components or pls factors)
to be retained. This list must contain two elements in the following order:
method (a character indicating the method for selecting the number of
components) and value (a numerical value that complements the selected
method). The methods available are:
"opc" : optimized principal component selection based on
Ramirez-Lopez et al. (2013a, 2013b). The optimal number of components
(of set of observations) is the one for which its distance matrix
minimizes the differences between the Yr value of each
observation and the Yr value of its closest observation. In this
case value must be a value ((larger than 0 and
below the minimum dimension of Xr or Xr and Xu
combined) indicating the maximum
number of principal components to be tested. See the
ortho_projection function for more details.
"cumvar" : selection of the principal components based
on a given cumulative amount of explained variance. In this case,
value must be a value (larger than 0 and below or equal to 1)
indicating the minimum amount of cumulative variance that the
combination of retained components should explain.
"var" : selection of the principal components based
on a given amount of explained variance. In this case,
value must be a value (larger than 0 and below or equal to 1)
indicating the minimum amount of variance that a single component
should explain in order to be retained.
"manual" : for manually specifying a fix number of
principal components. In this case, value must be a value
(larger than 0 and
below the minimum dimension of Xr or Xr and Xu
combined).
indicating the minimum amount of variance that a component should
explain in order to be retained.
The default is list(method = "var", value = 0.01) .
Optionally, the pc_selection argument admits "opc" or
"cumvar" or "var" or "manual" as a single character
string. In such a case the default "value" when either "opc" or
"manual" are used is 40. When "cumvar" is used the default
"value" is set to 0.99 and when "var" is used, the default
"value" is set to 0.01.
|
return_projection |
a logical indicating if the projection(s) must be
returned. Projections are used if the ortho_diss methods are
called (i.e. diss_method = "pca" , diss_method = "pca.nipals" or
diss_method = "pls" ) or when gh = TRUE .
In case gh = TRUE and a ortho_diss method is used (in the
diss_method argument), both projections are returned.
|
ws |
an odd integer value which specifies the window size, when
diss_method = "cor" (cor_diss method) for moving
correlation dissimilarity. If ws = NULL (default), then the window
size will be equal to the number of variables (columns), i.e. instead moving
correlation, the normal correlation will be used. See cor_diss
function.
|
center |
a logical indicating if Xr (and Xu if provided)
must be centered. If Xu is provided the data is centered around the
mean of the pooled Xr and Xu matrices (\(Xr \cup Xu\)). For
dissimilarity computations based on diss_method = pls , the data is
always centered.
|
scale |
a logical indicating if Xr (and Xu if
provided) must be scaled. If Xu is provided the data is scaled based
on the standard deviation of the the pooled Xr and Xu matrices
(\(Xr \cup Xu\)). If center = TRUE , scaling is applied after
centering.
|
documentation |
an optional character string that can be used to
describe anything related to the mbl call (e.g. description of the
input data). Default: character() . NOTE: his is an experimental
argument.
|
... |
other arguments passed to the dissimilarity functions
(ortho_diss , cor_diss , f_diss or
sid ).
|
Details
This function is a wrapper for ortho_diss
, cor_diss
,
f_diss
, sid
. Check the documentation of these
functions for further details.
Value
A list with the following components:
dissimilarity
: the resulting dissimilarity matrix.
projection
: an ortho_projection
object. Only output
if return_projection = TRUE
and if diss_method = "pca"
,
diss_method = "pca.nipals"
, diss_method = "pls"
or
diss_method = "mpls"
.
This object contains the projection used to compute
the dissimilarity matrix. In case of local dissimilarity matrices,
the projection corresponds to the global projection used to select the
neighborhoods (see ortho_diss
function for further
details).
gh
: a list containing the GH distances as well as the
pls projection used to compute the GH.
Author(s)
Leonardo Ramirez-Lopez
References
Shenk, J., Westerhaus, M., and Berzaghi, P. 1997. Investigation of a LOCAL
calibration procedure for near infrared instruments. Journal of Near Infrared
Spectroscopy, 5, 223-232.
Westerhaus, M. 2014. Eastern Analytical Symposium Award for outstanding
Wachievements in near infrared spectroscopy: my contributions to
Wnear infrared spectroscopy. NIR news, 25(8), 16-20.
See Also
ortho_diss
cor_diss
f_diss
sid
.
Examples
library(prospectr)
data(NIRsoil)
# Filter the data using the first derivative with Savitzky and Golay
# smoothing filter and a window size of 11 spectral variables and a
# polynomial order of 4
sg <- savitzkyGolay(NIRsoil$spc, m = 1, p = 4, w = 15)
# Replace the original spectra with the filtered ones
NIRsoil$spc <- sg
Xu <- NIRsoil$spc[!as.logical(NIRsoil$train), ]
Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]
Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]
Xu <- Xu[!is.na(Yu), ]
Xr <- Xr[!is.na(Yr), ]
Yu <- Yu[!is.na(Yu)]
Yr <- Yr[!is.na(Yr)]
dsm_pca <- dissimilarity(
Xr = Xr, Xu = Xu,
diss_method = c("pca"),
Yr = Yr, gh = TRUE,
pc_selection = list("opc", 30),
return_projection = TRUE
)
[Package
resemble version 2.2.3
Index]