strata.rule {stratification} | R Documentation |
Non-Iterative Methods of Strata Construction
Description
These functions first determine boundaries to stratify a population. Then, in a second independent step, the sample sizes are calculated given a CV or the CV is computed given the total sample size. The function strata.cumrootf
uses the cumulative root frequency method by Dalenius and Hodges (1959) and strata.geo
uses the geometric method by Gunning and Horgan (2004). A model can be specified for the relationship between the stratification variable X
and the survey variable Y
, but this model has no impact on the first step of boundary determination. It only influences the calculation of the n or of the CV by the use of anticipated means and variances of Y
instead of the empirical means and variances of X
.
Usage
strata.cumrootf(x, n = NULL, CV = NULL, Ls = 3, certain = NULL,
alloc = list(q1 = 0.5, q2 = 0, q3 = 0.5), rh = rep(1, Ls),
model = c("none", "loglinear", "linear", "random"),
model.control = list(), nclass = NULL)
strata.geo(x, n = NULL, CV = NULL, Ls = 3, certain=NULL,
alloc = list(q1 = 0.5, q2 = 0, q3 = 0.5), rh = rep(1, Ls),
model = c("none", "loglinear", "linear", "random"),
model.control = list())
Arguments
x |
A vector containing the values of the stratification variable |
n |
A numeric: the target sample size. It has no default value. The argument |
CV |
A numeric: the target coefficient of variation. It has no default value. The argument |
Ls |
A numeric: the number of sampled strata (take-none and certain strata are not counted in |
certain |
A vector giving the position, in the vector |
alloc |
A list specifying the allocation scheme. The list must contain 3 numerics for the 3 exponents |
rh |
A vector giving the anticipated response rates in each of the |
model |
A character string identifying the model used to describe the discrepancy between the stratification variable |
model.control |
A list of model parameters (see |
nclass |
A numeric for the cumulative root frequency method only: the number of classes (Dalenius and Hodges 1959). The default (see Details) is |
Details
The efficiency of the cumulative root frequency method depends on the number of classes nclass
(see Dalenius and Hodges (1959) for a description of these classes). However, there is no theory about how to choose the best value for nclass
(Hedlin 2000). This is a limit of the method.
Value
bh |
A vector of the |
nclassh |
A vector for the cumulative root frequency method only: the number of classes in each stratum (Dalenius and Hodges 1959). |
Nh |
A vector of length |
nh |
A vector of length |
n |
The total sample size ( |
nhnonint |
A vector of length |
certain.info |
A vector giving statistics for the certainty stratum (see |
opti.nh |
The final value of the criteria to optimize (either the total sample size |
opti.nhnonint |
The final value of the criteria to optimize (either the total sample size |
meanh |
A vector of length |
varh |
A vector of length |
mean |
A numeric: the anticipated global mean value of |
stderr |
A numeric: the standard error of the anticipated global mean of |
CV |
The anticipated coefficient of variation for the mean of |
stratumID |
A factor, having the same length as the input |
takeall |
The number of take-all strata in the final solution. Note: It is possible that |
call |
The function call (object of class "call"). |
date |
A character string that contains the system date and time when the function ended. |
args |
A list of all the argument values input to the function or set by default. |
Author(s)
Sophie Baillargeon Sophie.Baillargeon@mat.ulaval.ca and
Louis-Paul Rivest Louis-Paul.Rivest@mat.ulaval.ca
References
Baillargeon, S. and Rivest L.-P. (2011). The construction of stratified designs in R with the package stratification. Survey Methodology, 37(1), 53-65.
Dalenius, T. and Hodges, J.L., Jr. (1959). Minimum variance stratification. Journal of the American Statistical Association, 54, 88-101.
Gunning, P. and Horgan, J.M. (2004). A new algorithm for the construction of stratum boundaries in skewed populations. Survey Methodology, 30(2), 159-166.
Hedlin, D. (2000). A procedure for stratification by an extended Ekman rule. Journal of Official Statistics, 61, 15-29.
See Also
print.strata
, plot.strata
, strata.LH
Examples
### Example for strata.cumrootf
res <- matrix(NA, nrow=20, ncol=2)
i <- 1
for ( n in seq(100,2000,100)){
cum <- strata.cumrootf(x=MRTS, CV=0.01, Ls=4, alloc=c(0.5,0,0.5), nclass=n)
res[i,] <- c(n,cum$n)
i <- i + 1
}
plot(res, ylab="suggested sample size n", xlab="number of classes", main=expression(
paste("Example of the effect of nclass on n for the cum",sqrt(f)," method")))
### Example for strata.geo
strata.geo(x=Sweden$REV84, CV=0.05, Ls=5, alloc=c(0.35,0.35,0), model="none")
strata.geo(x=Sweden$REV84, CV=0.05, Ls=5, alloc=c(0.35,0.35,0), model="loglinear",
model.control=list(beta=1.058355, sig2=0.06593083, ph=1))
strata.geo(x=Sweden$REV84, CV=0.05, Ls=5, alloc=c(0.35,0.35,0), rh=0.85,
model="loglinear", model.control=list(beta=1.058355, sig2=0.06593083, ph=1))
# When non-response or a model is added, the stratum boundaries do not change,
# only the nh's do.
### Exemple of how a certainty stratum can be usefull with these methods
strata.cumrootf(x=Sweden$REV84, CV=0.05, Ls=4, alloc=c(0.35,0.35,0), model="none",
nclass=50)
strata.cumrootf(x=sort(Sweden$REV84), CV=0.05, Ls=4, alloc=c(0.35,0.35,0),
certain=282:284, model="none", nclass=50)
# The certainty stratum is used here to ensure that the three large units in the
# Sweden$REV84 population are in the sample, since no take-all stratum can be forced
# in the stratified design with the cumulative root frequency or geometric method.
# We see that this allows to reduce by more than half the suggested sample size n
# (47 vs 19). This example was presented in Baillargeon and Rivest (2011).