grouped_resample {archetypal} | R Documentation |
Function for performing simple or Dirichlet resampling
Description
The function may be used for standard bootstrapping or for subsampling, see [1]. This function allows samples to be drawn with or without replacement, by groups and with or without Dirichlet weights, see [2]. This provides a variety of options for researchers who wish to correct sample biases, estimate empirical confidence intervals, and/or subsample large data sets.
Usage
grouped_resample(in_data = NULL, grp_vector = NULL, grp_matrix = NULL,
replace = FALSE, option = "Simple", number_samples = 1,
nworkers = NULL, rseed = NULL)
Arguments
in_data |
The initial data frame that must be re-sampled. It must contain:
|
grp_vector |
The grouping variable of the data frame, defined under the name 'group' for example |
grp_matrix |
A matrix that contains
|
replace |
A logical input: TRUE/FALSE if replacement should be used or not, respectively |
option |
A character input with next possible values
|
number_samples |
The number of samples to be created. If it is greater than one, then parallel processing is used. |
nworkers |
The number of logical processors that will be used for parallel computing (usually it is the double of available physical cores) |
rseed |
The random seed that will be used for sampling. Useful for reproducible results |
Value
It returns a list of mumber_samples
data frames with exactly the same
variables as the initial one, except that group variable has now only the given
value from input data frame.
Author(s)
David Midgley
References
[1] D. N. Politis, J. P. Romano, M. Wolf, Subsampling (Springer-Verlag, New York, 1999).
[2] Baath R (2018). bayesboot: An Implementation of Rubin's (1981) Bayesian Bootstrap. R package version 0.2.2, URL https://CRAN.R-project.org/package=bayesboot
See Also
Examples
## Load absolute temperature data set:
data("AbsoluteTemperature")
df <- AbsoluteTemperature
## Find portions for climate zones
pcs <- table(df$z)/dim(df)[1]
## Choose the approximate size of the new sample and compute resample sizes
N <- round(sqrt(nrow(AbsoluteTemperature)))
resamplesizes=as.integer(round(N*pcs))
sum(resamplesizes)
## Create the grouping matrix
groupmat <- data.frame("Group_ID"=1:4,"Resample_Size"=resamplesizes)
groupmat
## Simple resampling:
resample_simple <- grouped_resample(in_data = df, grp_vector = "z",
grp_matrix = groupmat, replace = FALSE, option = "Simple",
number_samples = 1, nworkers = NULL, rseed = 20191220)
cat(dim(resample_simple[[1]]),"\n")
## Dirichlet resampling:
resample_dirichlet <- grouped_resample(in_data = df, grp_vector = "z",
grp_matrix = groupmat, replace = FALSE, option = "Dirichlet",
number_samples = 1, nworkers = NULL, rseed = 20191220)
cat(dim(resample_dirichlet[[1]]),"\n")
##
# ## Work in parallel and create many samples
# ## Choose a random seed
# nseed <- 20191119
# ## Simple
# reslist1 <- grouped_resample(in_data = df, grp_vector = "z", grp_matrix = groupmat,
# replace = FALSE, option = "Simple",
# number_samples = 10, nworkers = NULL,
# rseed = nseed)
# sapply(reslist1, dim)
# ## Dirichlet
# reslist2 <- grouped_resample(in_data = df, grp_vector = "z", grp_matrix = groupmat,
# replace = FALSE, option = "Dirichlet",
# number_samples = 10, nworkers = NULL,
# rseed = nseed)
# sapply(reslist2, dim)
# ## Check for same rows between 1st sample of 'Simple' and 1st sample of 'Dirichlet' ...
# mapply(function(x,y){sum(rownames(x)%in%rownames(y))},reslist1,reslist2)
#