search_neighbors {resemble} | R Documentation |
A function for searching in a given reference set the neighbors of another given set of observations (search_neighbors)
Description
This function searches in a reference set the neighbors of the observations provided in another set.
Usage
search_neighbors(Xr, Xu, diss_method = c("pca", "pca.nipals", "pls", "mpls",
"cor", "euclid", "cosine", "sid"),
Yr = NULL, k, k_diss, k_range, spike = NULL,
pc_selection = list("var", 0.01),
return_projection = FALSE, return_dissimilarity = FALSE,
ws = NULL,
center = TRUE, scale = FALSE,
documentation = character(), ...)
Arguments
Xr |
a matrix of reference (spectral) observations where the neighbor search is to be conducted. See details. |
Xu |
an optional matrix of (spectral) observations for which its
neighbors are to be searched in |
diss_method |
a character string indicating the spectral dissimilarity metric to be used in the selection of the nearest neighbors of each observation.
|
Yr |
a numeric matrix of
|
k |
an integer value indicating the k-nearest neighbors of each
observation in |
k_diss |
an integer value indicating a dissimilarity treshold.
For each observation in |
k_range |
an integer vector of length 2 which specifies the minimum
(first value) and the maximum (second value) number of neighbors to be
retained when the |
spike |
a vector of integers (with positive and/or negative values)
indicating what observations in |
pc_selection |
a list of length 2 to be passed onto the
The default is Optionally, the |
return_projection |
a logical indicating if the projection(s) must be
returned. Projections are used if the |
return_dissimilarity |
a logical indicating if the dissimilarity matrix used for neighbor search must be returned. |
ws |
an odd integer value which specifies the window size, when
|
center |
a logical indicating if the |
scale |
a logical indicating if the |
documentation |
an optional character string that can be used to
describe anything related to the |
... |
further arguments to be passed to the |
Details
This function may be specially useful when the reference set (Xr
) is
very large. In some cases the number of observations in the reference set
can be reduced by removing irrelevant observations (i.e. observations that are not
neighbors of a particular target set). For example, this fucntion can be
used to reduce the size of the reference set before before running the
mbl
function.
This function uses the dissimilarity
fucntion to compute the
dissimilarities between Xr
and Xu
. Arguments to
dissimilarity
as well as further arguments to the functions
used inside dissimilarity
(i.e. ortho_diss
cor_diss
f_diss
sid
) can be passed to
those functions as additional arguments (i.e. ...
).
If no matrix is passed to Xu
, the neighbor search is conducted for the
observations in Xr
that are found whiting that matrix. If a matrix is
passed to Xu
, the neighbors of Xu
are searched in the Xr
matrix.
Value
a list
containing the following elements:
neighbors_diss
: a matrix of theXr
dissimilarity scores corresponding to the neighbors of eachXr
observation (orXu
observation, in caseXu
was supplied). The neighbor dissimilarity scores are organized by columns and are sorted in ascending order.neighbors
: a matrix of theXr
indices corresponding to the neighbors of each observation inXu
. The neighbor indices are organized by columns and are sorted in ascending order by their dissimilarity score.unique_neighbors
: a vector of the indices inXr
identified as neighbors of any observation inXr
(or inXu
, in case it was supplied). This is obtained by converting theneighbors
matrix into a vector and applying theunique
function.k_diss_info
: adata.table
that is returned only if thek_diss
argument was used. It comprises three columns, the first one (Xr_index
orXu_index
) indicates the index of the observations inXr
(or inXu
, in case it was suppplied), the second column (n_k
) indicates the number of neighbors found inXr
and the third column (final_n_k
) indicates the final number of neighbors selected bounded byk_range
. argument.dissimilarity
: Ifreturn_dissimilarity = TRUE
the dissimilarity object used (as computed by thedissimilarity
function.projection
: anortho_projection
object. Only output ifreturn_projection = TRUE
and ifdiss_method = "pca"
,diss_method = "pca.nipals"
ordiss_method = "pls"
.
This object contains the projection used to compute the dissimilarity matrix. In case of local dissimilarity matrices, the projection corresponds to the global projection used to select the neighborhoods. (seeortho_diss
function for further details).
Author(s)
References
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex data sets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.
See Also
dissimilarity
ortho_diss
cor_diss
f_diss
sid
mbl
Examples
library(prospectr)
data(NIRsoil)
Xu <- NIRsoil$spc[!as.logical(NIRsoil$train), ]
Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]
Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]
Xu <- Xu[!is.na(Yu), ]
Yu <- Yu[!is.na(Yu)]
Xr <- Xr[!is.na(Yr), ]
Yr <- Yr[!is.na(Yr)]
# Identify the neighbor observations using the correlation dissimilarity and
# default parameters
# (In this example all the observations in Xr belong at least to the
# first 100 neighbors of one observation in Xu)
ex1 <- search_neighbors(
Xr = Xr, Xu = Xu,
diss_method = "cor",
k = 40
)
# Identify the neighbor observations using principal component (PC)
# and partial least squares (PLS) dissimilarities, and using the "opc"
# approach for selecting the number of components
ex2 <- search_neighbors(
Xr = Xr, Xu = Xu,
diss_method = "pca",
Yr = Yr, k = 50,
pc_selection = list("opc", 40),
scale = TRUE
)
# Observations that do not belong to any neighborhood
seq(1, nrow(Xr))[!seq(1, nrow(Xr)) %in% ex2$unique_neighbors]
ex3 <- search_neighbors(
Xr = Xr, Xu = Xu,
diss_method = "pls",
Yr = Yr, k = 50,
pc_selection = list("opc", 40),
scale = TRUE
)
# Observations that do not belong to any neighborhood
seq(1, nrow(Xr))[!seq(1, nrow(Xr)) %in% ex3$unique_neighbors]
# Identify the neighbor observations using local PC dissimialrities
# Here, 150 neighbors are used to compute a local dissimilarity matrix
# and then this matrix is used to select 50 neighbors
ex4 <- search_neighbors(
Xr = Xr, Xu = Xu,
diss_method = "pls",
Yr = Yr, k = 50,
pc_selection = list("opc", 40),
scale = TRUE,
.local = TRUE,
pre_k = 150
)