R: Bootstrap methods for two-stage sampling designs

boot.twostage {bootsurv}

R Documentation

Bootstrap methods for two-stage sampling designs

Description

The function boot.twostage applies one of the following bootstrap methods on complete (full response) survey data selected under stratified two-stage cluster sampling SRSWOR/SRSWOR: Rao and Wu (1988), Rao, Wu and Yue (1992), the modified version of Sitter (1992, CJS) (see Chen, Haziza and Mashreghi, 2022), Funaoka, Saigo, Sitter and Toida (2006), Chauvet (2007) or Preston (2009). This function also applies the method of Rao, Wu and Yue (1992) on complete survey data selected under stratified two-stage cluster sampling IPPSWOR/SRSWOR or the method of Chauvet (2007) on complete survey data selected under stratified two-stage cluster sampling CPS/SRSWOR.

Usage

boot.twostage(
  data,
  no.cluster,
  cluster.size,
  R,
  parameter = "total",
  bootstrap.method = "Rao.Wu.Yue",
  survey.design = "SRSWOR",
  population.size = NULL,
  boot.sample.size = NULL
)

Arguments

`data`	A vector, matrix or data frame. The column of study variable has to be a numeric column named `study.variable` and a column identifying clusters named `cluster` has to be included. If the population is stratified, a column identifying strata named `stratum` has to be included. If an IPPS design is applied on the first stage a column of first stage inclusion probability named `Pi1` has to be included.
`no.cluster`	A vector of the number of clusters within strata.
`cluster.size`	The number of elements within the selected clusters within each stratum. The length of this vector must be the same as the number of all selected clusters in all strata.
`R`	The number of bootstrap replicates. For the Chauvet (2007) method, `R` is a vector with two values: `⁠(R.pop, R.samp)⁠` representing the number of pseudo-populations and the number of bootstrap samples drawn from each pseudo-population.
`parameter`	One of the following population parameters can be applied: `"total"` (population total), `"mean"` (population mean), `"quartile.25"` (population 1st quartile), `"quartile.50"` or `"median"` (population median) or `"quartile.75"` (population 3rd quartile). If the parameter of interest is the population mean or total, the HT-estimator is applied. If the parameter of interest is a population quartile, the estimator in Sarndal, Swensson, and Wretman (1992, Chapter 5) is applied. The default is the population total.
`bootstrap.method`	One of the following bootstrap methods can be applied in the case of statratified SRS/SRS: `"Rao.Wu"` (Rao and Wu, 1988), `"Rao.Wu.Yue"` (Rao, Wu and Yue, 1992), `"Modified.Sitter"` (the modified version of Sitter 1992 discussed in Chen, Haziza and Mashreghi, 2022), `"Funaoka.etal"` (Funaoka, Saigo, Sitter and Toida, 2006), `"Chauvet"` (Chauvet, 2007) or `"Preston"` (Preston, 2009).
`survey.design`	It can be either `"IPPS"` only if the method of Rao, Wu and Yue (1992) is applied or `"CPS"` only if the method of Chauvet (2007) is applied or `"SRSWOR"`. The default is `"SRSWOR"`.
`population.size`	A vector of stratum population sizes.
`boot.sample.size`	A vector of bootstrap sample sizes within strata. The bootstrap sample size is required only for the method of Rao, Wu and Yue (1988). If it is not specified, the bootstrap sample size will be `nh-1` within each stratum, where `nh` is the original sample size within stratum `h`.

Value

boot.statistic A vector of bootstrap statistics of size R.

boot.var The bootstrap variance estimator of the estimator of parameter of interest.

boot.mean The average of the bootstrap estimator of the parameter of interest.

boot.sample A list of results for each iteration. That includes a column of original sample values, a column of cluster identifier and a column of stratum identifier. More information is availble depending on the bootstrap method.

References

Chauvet, G. (2007). Méthodes de bootstrap en population finie. PhD thesis, École Nationale de Statistique et Analyse de l’Information, Bruz, France.

Chen, S., Haziza, D. and Mashreghi, Z., (2022). A Comparison of Existing Bootstrap Algorithms for Multi-Stage Sampling Designs. Stats, 5(2), pp.521-537.

Funaoka, F., Saigo, H., Sitter, R.R., Toida, T. (2006). Bernoulli bootstrap for stratified multistage sampling. Survey Methodology, 32, 151–156.

Rao, J.N.K., Wu, C.F.J. (1998). Resampling inference with complex survey data. Journal of the American Statistical Association, 83, 231–241.

Rao, J.N.K., Wu, C.F.J., Yue, K. (1992). Some recent work on resampling methods for complex surveys. Survey Methodology, 18, 209–217.

Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model-Assisted Survey Sampling. NewYork: Springer.

Sitter, R.R. (1992). Comparing three bootstrap methods for survey data. The Canadian Journal of Statistics, 20, 135–154.

Preston, J. (2009). Rescaled bootstrap for stratified multistage sampling. Survey Methodology, 35, 227–234.

Examples


R<- 20

data(data_samp_clust)
data(data_pop_clust)
no_cluster<- 200
cluster_size<- table(data_pop_clust$cluster)[unique(data_samp_clust$cluster)]

# The first stage sampling fraction is about 20% and the overall second stage sampling is about 15%.
# data_samp_clust is a sample taken from data_pop_clust available in the package.

boot.RWY<- boot.twostage(data_samp_clust, no_cluster, cluster_size, R)
boot.RWY$boot.var

boot.Pr<- boot.twostage(data_samp_clust, no_cluster, cluster_size, R, bootstrap.method="Preston")
boot.Pr$boot.var

boot.RWY.med<- boot.twostage(data_samp_clust, no_cluster, cluster_size, R, parameter="median")
boot.RWY.med$boot.var
boot.RWY.med$boot.sample[[5]]

boot.Ch<- boot.twostage(data_samp_clust, no_cluster, cluster_size, R=c(5, 10),
           bootstrap.method="Chauvet")
boot.Ch$boot.mean

data(data_samp_stclust)
data(data_pop_stclust)
# The first stage sampling fraction is about 20% and the overall second stage sampling is about 15%.
# data_samp_stclust is a sample taken from data_pop_stclust available in the package.

no_cluster_stclust<- c(100, 125, 65)
cluster_size_pop_st<- aggregate(data_pop_stclust$cluster,
 by=list(data_pop_stclust$stratum), table)[[2]]
L<- length(unique(data_samp_stclust$stratum))
cluster_size_st<- NULL
for(h in 1:L) cluster_size_st<- c(cluster_size_st,
 cluster_size_pop_st[[h]][unique(data_samp_stclust$cluster[data_samp_stclust$stratum==h])])

boot.RWY.st<- boot.twostage(data_samp_stclust, no_cluster_stclust, cluster_size_st, R)
boot.RWY.st$boot.statistic

[Package bootsurv version 0.0.1 Index]