splitSample {analogue} | R Documentation |
Select samples from along an environmental gradient
Description
Select samples from along an environmental gradient by splitting the gradient into discrete chunks and sample within each chunk. This allows a test set to be selected which covers the environmental gradient of the training set, for example.
Usage
splitSample(env, chunk = 10, take, nchunk,
fill = c("head", "tail", "random"),
maxit = 1000)
Arguments
env |
numeric; vector of samples representing the gradient values. |
chunk |
numeric; number of chunks to split the gradient into. |
take |
numeric; how many samples to take from the gradient. Can not be missing. |
nchunk |
numeric; number of samples per chunk. Must be a vector
of length |
fill |
character; the type of filling of chunks to perform. See Details. |
maxit |
numeric; maximum number of iterations in which to try to
sample |
Details
The gradient is split into chunk
sections and samples are
selected from each chunk to result in a sample of length
take
. If take
is divisible by chunk
without
remainder then there will an equal number of samples selected from
each chunk. Where chunk
is not a multiple of take
and
nchunk
is not specified then extra samples have to be allocated
to some of the chunks to reach the required number of samples
selected.
An additional complication is that some chunks of the gradient may
have fewer than nchunk
samples and therefore more samples need
to be selected from the remaining chunks until take
samples are
chosen.
If nchunk
is supplied, it must be a vector stating exactly how
many samples to select from each chunk. If chunk
is not
supplied, then the number of samples per chunk is determined as
follows:
An intial allocation of
floor(take / chunk)
is assigned to each chunkIf any chunks have fewer samples than this initial allocation, these elements of
nchunk
are reset to the number of samples in those chunksSequentially an extra sample is allocated to each chunk with sufficient available samples until
take
samples are selected.
Argument fill
controls the order in which the chunks are
filled. fill = "head"
fills from the low to the high end of the
gradient, whilst fill = "tail"
fills in the opposite
direction. Chunks are filled in random order if fill =
"random"
. In all cases no chunk is filled by more than one extra
sample until all chunks that can supply one extra sample are
filled. In the case of fill = "head"
or fill = "tail"
this entails moving along the gradient from one end to the other
allocating an extra sample to available chunks before starting along
the gradient again. For fill = "random"
, a random order of
chunks to fill is determined, if an extra sample is allocated to each
chunk in the random order and take
samples are still not
selected, filling begins again using the same random ordering. In
other words, the random order of chunks to fill is chosen only once.
Value
A numeric vector of indices of selected samples. This vector has
attribute lengths
which indicates how many samples were
actually chosen from each chunk.
Author(s)
Gavin L. Simpson
Examples
data(swappH)
## take a test set of 20 samples along the pH gradient
test1 <- splitSample(swappH, chunk = 10, take = 20)
test1
swappH[test1]
## take a larger sample where some chunks don't have many samples
## do random filling
set.seed(3)
test2 <- splitSample(swappH, chunk = 10, take = 70, fill = "random")
test2
swappH[test2]