ortho_diss {resemble} | R Documentation |
A function for computing dissimilarity matrices from orthogonal projections (ortho_diss)
Description
This function computes dissimilarities (in an orthogonal space) between either observations in a given set or between observations in two different sets.The dissimilarities are computed based on either principal component projection or partial least squares projection of the data. After projecting the data, the Mahalanobis distance is applied.
Usage
ortho_diss(Xr, Xu = NULL,
Yr = NULL,
pc_selection = list(method = "var", value = 0.01),
diss_method = "pca",
.local = FALSE,
pre_k,
center = TRUE,
scale = FALSE,
compute_all = FALSE,
return_projection = FALSE,
allow_parallel = TRUE, ...)
Arguments
Xr |
a matrix containing |
Xu |
an optional matrix containing data of a second set of observations
with |
Yr |
a matrix of
|
pc_selection |
a list of length 2 which specifies the method to be used
for optimizing the number of components (principal components or pls factors)
to be retained. This list must contain two elements (in the following order):
Default is Optionally, the |
diss_method |
a character value indicating the type of projection on which
the dissimilarities must be computed. This argument is equivalent to
See the |
.local |
a logical indicating whether or not to compute the dissimilarities
locally (i.e. projecting locally the data) by using the |
pre_k |
if |
center |
a logical indicating if the |
scale |
a logical indicating if the |
compute_all |
a logical. In case |
return_projection |
a logical. If |
allow_parallel |
a logical (default TRUE). It allows parallel computing
of the local distance matrices (i.e. when |
... |
additional arguments to be passed to the
|
Details
When .local = TRUE
, first a global dissimilarity matrix is computed based on
the parameters specified. Then, by using this matrix for each target
observation, a given set of nearest neighbors (pre_k
) are identified.
These neighbors (together with the target observation) are projected
(from the original data space) onto a (local) orthogonal space (using the
same parameters specified in the function). In this projected space the
Mahalanobis distance between the target observation and its neighbors is
recomputed. A missing value is assigned to the observations that do not belong to
this set of neighbors (non-neighbor observations).
In this case the dissimilarity matrix cannot be considered as a distance
metric since it does not necessarily satisfies the symmetry condition for
distance matrices (i.e. given two observations \(x_i\) and \(x_j\), the local
dissimilarity (\(d\)) between them is relative since generally
\(d(x_i, x_j) \neq d(x_j, x_i)\)). On the other hand, when
.local = FALSE
, the dissimilarity matrix obtained can be considered as
a distance matrix.
In the cases where "Yr"
is required to compute the dissimilarities and
if .local = TRUE
, care must be taken as some neighborhoods might
not have enough observations with non-missing "Yr"
values, which might retrieve
unreliable dissimilarity computations.
If "opc"
or "manual"
are used in pc_selection$method
and .local = TRUE
, the minimum number of observations with non-missing
"Yr"
values at each neighborhood is determined by
pc_selection$value
(i.e. the maximum number of components to compute).
Value
a list
of class ortho_diss
with the following elements:
n_components
: the number of components (either principal components or partial least squares components) used for computing the global dissimilarities.global_variance_info
: the information about the expalined variance(s) of the projection. When.local = TRUE
, the information corresponds to the global projection done prior computing the local projections.local_n_components
: if.local = TRUE
, a data.table which specifies the number of local components (either principal components or partial least squares components) used for computing the dissimilarity between each target observation and its neighbor observations.dissimilarity
: the computed dissimilarity matrix. If.local = FALSE
a distance matrix. If.local = TRUE
a matrix of classlocal_ortho_diss
. In this case, each column represent the dissimilarity between a target observation and its neighbor observations.projection
: ifreturn_projection = TRUE
, anortho_projection
object.
Author(s)
References
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex data sets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.
See Also
Examples
library(prospectr)
data(NIRsoil)
Xu <- NIRsoil$spc[!as.logical(NIRsoil$train), ]
Yu <- NIRsoil[!as.logical(NIRsoil$train), "CEC", drop = FALSE]
Yr <- NIRsoil[as.logical(NIRsoil$train), "CEC", drop = FALSE]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]
Xu <- Xu[!is.na(Yu), ]
Yu <- Yu[!is.na(Yu), , drop = FALSE]
Xr <- Xr[!is.na(Yr), ]
Yr <- Yr[!is.na(Yr), , drop = FALSE]
# Computation of the orthogonal dissimilarity matrix using the
# default parameters
pca_diss <- ortho_diss(Xr, Xu)
# Computation of a principal component dissimilarity matrix using
# the "opc" method for the selection of the principal components
pca_diss_optim <- ortho_diss(
Xr, Xu, Yr,
pc_selection = list("opc", 40),
compute_all = TRUE
)
# Computation of a partial least squares (PLS) dissimilarity
# matrix using the "opc" method for the selection of the PLS
# components
pls_diss_optim <- ortho_diss(
Xr = Xr, Xu = Xu,
Yr = Yr,
pc_selection = list("opc", 40),
diss_method = "pls"
)