R: Bin Sample a Vector, Matrix, or Data Frame

bin.sample {npreg}

R Documentation

Bin Sample a Vector, Matrix, or Data Frame

Description

Bin elements of a vector (or rows of a matrix/data frame) and randomly sample a specified number of elements from each bin. Returns sampled data and (optionally) indices of sampled data and/or breaks for defining bins.

Usage

bin.sample(x, nbin = 5, size = 1, equidistant = FALSE, 
           index.return = FALSE, breaks.return = FALSE)

Arguments

`x`	Vector, matrix, or data frame to bin sample. Factors are allowed.
`nbin`	Number of bins for each variable (defaults to 5 bins for each dimension of `x`). If `length(bins) != ncol(x)`, then `nbin[1]` is used for each variable.
`size`	Size of sample to randomly draw from each bin (defaults to 1).
`equidistant`	Should bins be defined equidistantly for each predictor? If `FALSE` (default), sample quantiles define bins for each predictor. If `length(equidistant) != ncol(x)`, then `equidistant[1]` is used for each variable.
`index.return`	If `TRUE`, returns the (row) indices of the bin sampled observations.
`breaks.return`	If `TRUE`, returns the (lower bounds of the) breaks for the binning.

Details

For a single variable, the unidimensional bins are defined using the .bincode function. For multiple variables, the multidimensional bins are defined using the algorithm described in the appendix of Helwig et al. (2015), which combines the unidimensional bins (calculated via .bincode) into a multidimensional bin code.

Value

If index.return = FALSE and breaks.return = FALSE, returns the bin sampled x observations.

If index.return = TRUE and/or breaks.return = TRUE, returns a list with elements:

`x`	bin sampled `x` observations.
`ix`	row indices of bin sampled observations (if `index.return = TRUE`).
`bx`	lower bounds of breaks defining bins (if `breaks.return = TRUE`).

Note

For factors, the number of bins is automatically defined to be the number of levels.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Helwig, N. E., Gao, Y., Wang, S., & Ma, P. (2015). Analyzing spatiotemporal trends in social media data via smoothing spline analysis of variance. Spatial Statistics, 14(C), 491-504. doi:10.1016/j.spasta.2015.09.002

Examples

##########   EXAMPLE 1   ##########
### unidimensional binning

# generate data
x <- seq(0, 1, length.out = 101)

# bin sample (default)
set.seed(1)
bin.sample(x)

# bin sample (return indices)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE)
xs$x             # sampled data
x[xs$ix]         # indexing sampled data

# bin sample (return indices and breaks)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE, breaks.return = TRUE)
xs$x             # sampled data
x[xs$ix]         # indexing sampled data
xs$bx            # breaks



##########   EXAMPLE 2   ##########
### bidimensional binning

# generate data
x <- expand.grid(x1 = seq(0, 1, length.out = 101),
                 x2 = seq(0, 1, length.out = 101))

# bin sample (default)
set.seed(1)
bin.sample(x)

# bin sample (return indices)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE)
xs$x             # sampled data
x[xs$ix,]        # indexing sampled data

# bin sample (return indices and breaks)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE, breaks.return = TRUE)
xs$x             # sampled data
x[xs$ix,]        # indexing sampled data
xs$bx            # breaks

# plot breaks and 25 bins
plot(xs$bx, xlim = c(0, 1), ylim = c(0, 1),
     xlab = "x1", ylab = "x2", main = "25 bidimensional bins")
grid()
text(xs$bx + 0.1, labels = 1:25)

[Package npreg version 1.1.0 Index]