bin.sample {npreg} | R Documentation |
Bin Sample a Vector, Matrix, or Data Frame
Description
Bin elements of a vector (or rows of a matrix/data frame) and randomly sample a specified number of elements from each bin. Returns sampled data and (optionally) indices of sampled data and/or breaks for defining bins.
Usage
bin.sample(x, nbin = 5, size = 1, equidistant = FALSE,
index.return = FALSE, breaks.return = FALSE)
Arguments
x |
Vector, matrix, or data frame to bin sample. Factors are allowed. |
nbin |
Number of bins for each variable (defaults to 5 bins for each dimension of |
size |
Size of sample to randomly draw from each bin (defaults to 1). |
equidistant |
Should bins be defined equidistantly for each predictor? If |
index.return |
If |
breaks.return |
If |
Details
For a single variable, the unidimensional bins are defined using the .bincode
function. For multiple variables, the multidimensional bins are defined using the algorithm described in the appendix of Helwig et al. (2015), which combines the unidimensional bins (calculated via .bincode
) into a multidimensional bin code.
Value
If index.return = FALSE
and breaks.return = FALSE
, returns the bin sampled x
observations.
If index.return = TRUE
and/or breaks.return = TRUE
, returns a list with elements:
x |
bin sampled |
ix |
row indices of bin sampled observations (if |
bx |
lower bounds of breaks defining bins (if |
Note
For factors, the number of bins is automatically defined to be the number of levels.
Author(s)
Nathaniel E. Helwig <helwig@umn.edu>
References
Helwig, N. E., Gao, Y., Wang, S., & Ma, P. (2015). Analyzing spatiotemporal trends in social media data via smoothing spline analysis of variance. Spatial Statistics, 14(C), 491-504. doi:10.1016/j.spasta.2015.09.002
See Also
.bincode
for binning a numeric vector
Examples
########## EXAMPLE 1 ##########
### unidimensional binning
# generate data
x <- seq(0, 1, length.out = 101)
# bin sample (default)
set.seed(1)
bin.sample(x)
# bin sample (return indices)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE)
xs$x # sampled data
x[xs$ix] # indexing sampled data
# bin sample (return indices and breaks)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE, breaks.return = TRUE)
xs$x # sampled data
x[xs$ix] # indexing sampled data
xs$bx # breaks
########## EXAMPLE 2 ##########
### bidimensional binning
# generate data
x <- expand.grid(x1 = seq(0, 1, length.out = 101),
x2 = seq(0, 1, length.out = 101))
# bin sample (default)
set.seed(1)
bin.sample(x)
# bin sample (return indices)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE)
xs$x # sampled data
x[xs$ix,] # indexing sampled data
# bin sample (return indices and breaks)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE, breaks.return = TRUE)
xs$x # sampled data
x[xs$ix,] # indexing sampled data
xs$bx # breaks
# plot breaks and 25 bins
plot(xs$bx, xlim = c(0, 1), ylim = c(0, 1),
xlab = "x1", ylab = "x2", main = "25 bidimensional bins")
grid()
text(xs$bx + 0.1, labels = 1:25)