boot.twostage {bootsurv} | R Documentation |
Bootstrap methods for two-stage sampling designs
Description
The function boot.twostage
applies one of the following bootstrap methods on complete (full response) survey data selected under stratified two-stage cluster sampling SRSWOR/SRSWOR: Rao and Wu (1988), Rao, Wu and Yue (1992), the modified version of Sitter (1992, CJS) (see Chen, Haziza and Mashreghi, 2022), Funaoka, Saigo, Sitter and Toida (2006), Chauvet (2007) or Preston (2009).
This function also applies the method of Rao, Wu and Yue (1992) on complete survey data selected under stratified two-stage cluster sampling IPPSWOR/SRSWOR or the method of Chauvet (2007) on complete survey data selected under stratified two-stage cluster sampling CPS/SRSWOR.
Usage
boot.twostage(
data,
no.cluster,
cluster.size,
R,
parameter = "total",
bootstrap.method = "Rao.Wu.Yue",
survey.design = "SRSWOR",
population.size = NULL,
boot.sample.size = NULL
)
Arguments
data |
A vector, matrix or data frame. The column of study variable has to be a numeric column named |
no.cluster |
A vector of the number of clusters within strata. |
cluster.size |
The number of elements within the selected clusters within each stratum. The length of this vector must be the same as the number of all selected clusters in all strata. |
R |
The number of bootstrap replicates. For the Chauvet (2007) method, |
parameter |
One of the following population parameters can be applied: |
bootstrap.method |
One of the following bootstrap methods can be applied in the case of statratified SRS/SRS: |
survey.design |
It can be either |
population.size |
A vector of stratum population sizes. |
boot.sample.size |
A vector of bootstrap sample sizes within strata. The bootstrap sample size is required only for the method of Rao, Wu and Yue (1988). If it is not specified, the bootstrap sample size will be |
Value
boot.statistic
A vector of bootstrap statistics of size R.
boot.var
The bootstrap variance estimator of the estimator of parameter of interest.
boot.mean
The average of the bootstrap estimator of the parameter of interest.
boot.sample
A list of results for each iteration. That includes a column of original sample values, a column of cluster identifier and a column of stratum identifier. More information is availble depending on the bootstrap method.
References
Chauvet, G. (2007). Méthodes de bootstrap en population finie. PhD thesis, École Nationale de Statistique et Analyse de l’Information, Bruz, France.
Chen, S., Haziza, D. and Mashreghi, Z., (2022). A Comparison of Existing Bootstrap Algorithms for Multi-Stage Sampling Designs. Stats, 5(2), pp.521-537.
Funaoka, F., Saigo, H., Sitter, R.R., Toida, T. (2006). Bernoulli bootstrap for stratified multistage sampling. Survey Methodology, 32, 151–156.
Rao, J.N.K., Wu, C.F.J. (1998). Resampling inference with complex survey data. Journal of the American Statistical Association, 83, 231–241.
Rao, J.N.K., Wu, C.F.J., Yue, K. (1992). Some recent work on resampling methods for complex surveys. Survey Methodology, 18, 209–217.
Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model-Assisted Survey Sampling. NewYork: Springer.
Sitter, R.R. (1992). Comparing three bootstrap methods for survey data. The Canadian Journal of Statistics, 20, 135–154.
Preston, J. (2009). Rescaled bootstrap for stratified multistage sampling. Survey Methodology, 35, 227–234.
Examples
R<- 20
data(data_samp_clust)
data(data_pop_clust)
no_cluster<- 200
cluster_size<- table(data_pop_clust$cluster)[unique(data_samp_clust$cluster)]
# The first stage sampling fraction is about 20% and the overall second stage sampling is about 15%.
# data_samp_clust is a sample taken from data_pop_clust available in the package.
boot.RWY<- boot.twostage(data_samp_clust, no_cluster, cluster_size, R)
boot.RWY$boot.var
boot.Pr<- boot.twostage(data_samp_clust, no_cluster, cluster_size, R, bootstrap.method="Preston")
boot.Pr$boot.var
boot.RWY.med<- boot.twostage(data_samp_clust, no_cluster, cluster_size, R, parameter="median")
boot.RWY.med$boot.var
boot.RWY.med$boot.sample[[5]]
boot.Ch<- boot.twostage(data_samp_clust, no_cluster, cluster_size, R=c(5, 10),
bootstrap.method="Chauvet")
boot.Ch$boot.mean
data(data_samp_stclust)
data(data_pop_stclust)
# The first stage sampling fraction is about 20% and the overall second stage sampling is about 15%.
# data_samp_stclust is a sample taken from data_pop_stclust available in the package.
no_cluster_stclust<- c(100, 125, 65)
cluster_size_pop_st<- aggregate(data_pop_stclust$cluster,
by=list(data_pop_stclust$stratum), table)[[2]]
L<- length(unique(data_samp_stclust$stratum))
cluster_size_st<- NULL
for(h in 1:L) cluster_size_st<- c(cluster_size_st,
cluster_size_pop_st[[h]][unique(data_samp_stclust$cluster[data_samp_stclust$stratum==h])])
boot.RWY.st<- boot.twostage(data_samp_stclust, no_cluster_stclust, cluster_size_st, R)
boot.RWY.st$boot.statistic