| search_neighbors {resemble} | R Documentation | 
A function for searching in a given reference set the neighbors of another given set of observations (search_neighbors)
Description
This function searches in a reference set the neighbors of the observations provided in another set.
Usage
search_neighbors(Xr, Xu, diss_method = c("pca", "pca.nipals", "pls", "mpls",
                                         "cor", "euclid", "cosine", "sid"),
                 Yr = NULL, k, k_diss, k_range, spike = NULL,
                 pc_selection = list("var", 0.01),
                 return_projection = FALSE, return_dissimilarity = FALSE,
                 ws = NULL,
                 center = TRUE, scale = FALSE,
                 documentation = character(), ...)
Arguments
| Xr | a matrix of reference (spectral) observations where the neighbor search is to be conducted. See details. | 
| Xu | an optional matrix of (spectral) observations for which its
neighbors are to be searched in  | 
| diss_method | a character string indicating the spectral dissimilarity metric to be used in the selection of the nearest neighbors of each observation. 
 | 
| Yr | a numeric matrix of  
 | 
| k | an integer value indicating the k-nearest neighbors of each
observation in  | 
| k_diss | an integer value indicating a dissimilarity treshold.
For each observation in  | 
| k_range | an integer vector of length 2 which specifies the minimum
(first value) and the maximum (second value) number of neighbors to be
retained when the  | 
| spike | a vector of integers (with positive and/or negative values)
indicating what observations in  | 
| pc_selection | a list of length 2 to be passed onto the
 
 The default is  Optionally, the  | 
| return_projection | a logical indicating if the projection(s) must be
returned. Projections are used if the  | 
| return_dissimilarity | a logical indicating if the dissimilarity matrix used for neighbor search must be returned. | 
| ws | an odd integer value which specifies the window size, when
 | 
| center | a logical indicating if the  | 
| scale | a logical indicating if the  | 
| documentation | an optional character string that can be used to
describe anything related to the  | 
| ... | further arguments to be passed to the  | 
Details
This function may be specially useful when the reference set (Xr) is
very large. In some cases the number of observations in the reference set
can be reduced by removing irrelevant observations (i.e. observations that are not
neighbors of a particular target set). For example, this fucntion can be
used to reduce the size of the reference set before before  running the
mbl function.
This function uses the dissimilarity fucntion to compute the
dissimilarities between Xr and Xu. Arguments to
dissimilarity as well as further arguments to the functions
used inside dissimilarity (i.e. ortho_diss
cor_diss f_diss sid) can be passed to
those functions as additional arguments (i.e. ...).
If no matrix is passed to Xu, the neighbor search is conducted for the
observations in Xr that are found whiting that matrix. If a matrix is
passed to Xu,  the neighbors of Xu are searched in the Xr
matrix.
Value
a list containing the following elements:
- neighbors_diss: a matrix of the- Xrdissimilarity scores corresponding to the neighbors of each- Xrobservation (or- Xuobservation, in case- Xuwas supplied). The neighbor dissimilarity scores are organized by columns and are sorted in ascending order.
- neighbors: a matrix of the- Xrindices corresponding to the neighbors of each observation in- Xu. The neighbor indices are organized by columns and are sorted in ascending order by their dissimilarity score.
- unique_neighbors: a vector of the indices in- Xridentified as neighbors of any observation in- Xr(or in- Xu, in case it was supplied). This is obtained by converting the- neighborsmatrix into a vector and applying the- uniquefunction.
- k_diss_info: a- data.tablethat is returned only if the- k_dissargument was used. It comprises three columns, the first one (- Xr_indexor- Xu_index) indicates the index of the observations in- Xr(or in- Xu, in case it was suppplied), the second column (- n_k) indicates the number of neighbors found in- Xrand the third column (- final_n_k) indicates the final number of neighbors selected bounded by- k_range. argument.
- dissimilarity: If- return_dissimilarity = TRUEthe dissimilarity object used (as computed by the- dissimilarityfunction.
- projection: an- ortho_projectionobject. Only output if- return_projection = TRUEand if- diss_method = "pca",- diss_method = "pca.nipals"or- diss_method = "pls".
 This object contains the projection used to compute the dissimilarity matrix. In case of local dissimilarity matrices, the projection corresponds to the global projection used to select the neighborhoods. (see- ortho_dissfunction for further details).
Author(s)
References
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex data sets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.
See Also
dissimilarity ortho_diss
cor_diss f_diss sid
mbl
Examples
library(prospectr)
data(NIRsoil)
Xu <- NIRsoil$spc[!as.logical(NIRsoil$train), ]
Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]
Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]
Xu <- Xu[!is.na(Yu), ]
Yu <- Yu[!is.na(Yu)]
Xr <- Xr[!is.na(Yr), ]
Yr <- Yr[!is.na(Yr)]
# Identify the neighbor observations using the correlation dissimilarity and
# default parameters
# (In this example all the observations in Xr belong at least to the
# first 100 neighbors of one observation in Xu)
ex1 <- search_neighbors(
  Xr = Xr, Xu = Xu,
  diss_method = "cor",
  k = 40
)
# Identify the neighbor observations using principal component (PC)
# and partial least squares (PLS) dissimilarities, and using the "opc"
# approach for selecting the number of components
ex2 <- search_neighbors(
  Xr = Xr, Xu = Xu,
  diss_method = "pca",
  Yr = Yr, k = 50,
  pc_selection = list("opc", 40),
  scale = TRUE
)
# Observations that do not belong to any neighborhood
seq(1, nrow(Xr))[!seq(1, nrow(Xr)) %in% ex2$unique_neighbors]
ex3 <- search_neighbors(
  Xr = Xr, Xu = Xu,
  diss_method = "pls",
  Yr = Yr, k = 50,
  pc_selection = list("opc", 40),
  scale = TRUE
)
# Observations that do not belong to any neighborhood
seq(1, nrow(Xr))[!seq(1, nrow(Xr)) %in% ex3$unique_neighbors]
# Identify the neighbor observations using local PC dissimialrities
# Here, 150 neighbors are used to compute a local dissimilarity matrix
# and then this matrix is used to select 50 neighbors
ex4 <- search_neighbors(
  Xr = Xr, Xu = Xu,
  diss_method = "pls",
  Yr = Yr, k = 50,
  pc_selection = list("opc", 40),
  scale = TRUE,
  .local = TRUE,
  pre_k = 150
)