Bootstrap methods for two-stage sampling designs


The function boot.twostage applies one of the following bootstrap methods on complete (full response) survey data selected under stratified two-stage cluster sampling SRSWOR/SRSWOR: Rao and Wu (1988), Rao, Wu and Yue (1992), the modified version of Sitter (1992, CJS) (see Chen, Haziza and Mashreghi, 2022), Funaoka, Saigo, Sitter and Toida (2006), Chauvet (2007) or Preston (2009). This function also applies the method of Rao, Wu and Yue (1992) on complete survey data selected under stratified two-stage cluster sampling IPPSWOR/SRSWOR or the method of Chauvet (2007) on complete survey data selected under stratified two-stage cluster sampling CPS/SRSWOR.


  parameter = "total",
  bootstrap.method = "Rao.Wu.Yue", = "SRSWOR",
  population.size = NULL,
  boot.sample.size = NULL



A vector, matrix or data frame. The column of study variable has to be a numeric column named study.variable and a column identifying clusters named cluster has to be included. If the population is stratified, a column identifying strata named stratum has to be included. If an IPPS design is applied on the first stage a column of first stage inclusion probability named Pi1 has to be included.


A vector of the number of clusters within strata.


The number of elements within the selected clusters within each stratum. The length of this vector must be the same as the number of all selected clusters in all strata.


The number of bootstrap replicates. For the Chauvet (2007) method, R is a vector with two values: ⁠(R.pop, R.samp)⁠ representing the number of pseudo-populations and the number of bootstrap samples drawn from each pseudo-population.


One of the following population parameters can be applied: "total" (population total), "mean" (population mean), "quartile.25" (population 1st quartile), "quartile.50" or "median" (population median) or "quartile.75" (population 3rd quartile). If the parameter of interest is the population mean or total, the HT-estimator is applied. If the parameter of interest is a population quartile, the estimator in Sarndal, Swensson, and Wretman (1992, Chapter 5) is applied. The default is the population total.


One of the following bootstrap methods can be applied in the case of statratified SRS/SRS: "Rao.Wu" (Rao and Wu, 1988), "Rao.Wu.Yue" (Rao, Wu and Yue, 1992), "Modified.Sitter" (the modified version of Sitter 1992 discussed in Chen, Haziza and Mashreghi, 2022), "Funaoka.etal" (Funaoka, Saigo, Sitter and Toida, 2006), "Chauvet" (Chauvet, 2007) or "Preston" (Preston, 2009).

It can be either "IPPS" only if the method of Rao, Wu and Yue (1992) is applied or "CPS" only if the method of Chauvet (2007) is applied or "SRSWOR". The default is "SRSWOR".


A vector of stratum population sizes.


A vector of bootstrap sample sizes within strata. The bootstrap sample size is required only for the method of Rao, Wu and Yue (1988). If it is not specified, the bootstrap sample size will be nh-1 within each stratum, where nh is the original sample size within stratum h.


boot.statistic A vector of bootstrap statistics of size R.

boot.var The bootstrap variance estimator of the estimator of parameter of interest.

boot.mean The average of the bootstrap estimator of the parameter of interest.

boot.sample A list of results for each iteration. That includes a column of original sample values, a column of cluster identifier and a column of stratum identifier. More information is availble depending on the bootstrap method.


R<- 20

no_cluster<- 200
cluster_size<- table(data_pop_clust$cluster)[unique(data_samp_clust$cluster)]

# The first stage sampling fraction is about 20% and the overall second stage sampling is about 15%.
# data_samp_clust is a sample taken from data_pop_clust available in the package.

boot.RWY<- boot.twostage(data_samp_clust, no_cluster, cluster_size, R)

boot.Pr<- boot.twostage(data_samp_clust, no_cluster, cluster_size, R, bootstrap.method="Preston")
boot.Pr$boot.var<- boot.twostage(data_samp_clust, no_cluster, cluster_size, R, parameter="median")$boot.var$boot.sample[[5]]

boot.Ch<- boot.twostage(data_samp_clust, no_cluster, cluster_size, R=c(5, 10),

# The first stage sampling fraction is about 20% and the overall second stage sampling is about 15%.
# data_samp_stclust is a sample taken from data_pop_stclust available in the package.

no_cluster_stclust<- c(100, 125, 65)
cluster_size_pop_st<- aggregate(data_pop_stclust$cluster,
 by=list(data_pop_stclust$stratum), table)[[2]]
L<- length(unique(data_samp_stclust$stratum))
cluster_size_st<- NULL
for(h in 1:L) cluster_size_st<- c(cluster_size_st,
 cluster_size_pop_st[[h]][unique(data_samp_stclust$cluster[data_samp_stclust$stratum==h])])<- boot.twostage(data_samp_stclust, no_cluster_stclust, cluster_size_st, R)$boot.statistic

