strata.rule {stratification}R Documentation

Non-Iterative Methods of Strata Construction

Description

These functions first determine boundaries to stratify a population. Then, in a second independent step, the sample sizes are calculated given a CV or the CV is computed given the total sample size. The function strata.cumrootf uses the cumulative root frequency method by Dalenius and Hodges (1959) and strata.geo uses the geometric method by Gunning and Horgan (2004). A model can be specified for the relationship between the stratification variable X and the survey variable Y, but this model has no impact on the first step of boundary determination. It only influences the calculation of the n or of the CV by the use of anticipated means and variances of Y instead of the empirical means and variances of X.

Usage

strata.cumrootf(x, n = NULL, CV = NULL, Ls = 3, certain = NULL, 
       alloc = list(q1 = 0.5, q2 = 0, q3 = 0.5), rh = rep(1, Ls), 
       model = c("none", "loglinear", "linear", "random"),
       model.control = list(), nclass = NULL)

strata.geo(x, n = NULL, CV = NULL, Ls = 3, certain=NULL,
       alloc = list(q1 = 0.5, q2 = 0, q3 = 0.5), rh = rep(1, Ls),
       model = c("none", "loglinear", "linear", "random"),
       model.control = list())

Arguments

x

A vector containing the values of the stratification variable X for every unit in the population.

n

A numeric: the target sample size. It has no default value. The argument n or the argument CV must be input.

CV

A numeric: the target coefficient of variation. It has no default value. The argument CV or the argument n must be input.

Ls

A numeric: the number of sampled strata (take-none and certain strata are not counted in Ls, but here no take-none stratum can be added to the stratified design so Ls is in fact always equal to L). The default is 3.

certain

A vector giving the position, in the vector x, of the units that must be included in the sample (see stratification-package). By default certain is NULL, which means that no units are chosen a priori to be in the sample.

alloc

A list specifying the allocation scheme. The list must contain 3 numerics for the 3 exponents q1, q2 and q3 in the general allocation scheme (see stratification-package). The default is Neyman allocation (q1=q3=0.5 and q2=0)

rh

A vector giving the anticipated response rates in each of the Ls sampled strata. A single number can be given if the rates do not vary among strata. The default is 1 in each stratum.

model

A character string identifying the model used to describe the discrepancy between the stratification variable X and the survey variable Y. It can be "none" if one assumes Y=X, "loglinear" for the loglinear model with mortality, "linear" for the heteroscedastic linear model or "random" for the random replacement model (see stratification-package for a description of these models). The default is "none".

model.control

A list of model parameters (see stratification-package). The default values of the parameters correspond to the model Y=X.

nclass

A numeric for the cumulative root frequency method only: the number of classes (Dalenius and Hodges 1959). The default (see Details) is min(Ls*15, Nu) where Nu is the number of unique values in the x-vector from which units in the certainty stratum, if any, heve been removed.

Details

The efficiency of the cumulative root frequency method depends on the number of classes nclass (see Dalenius and Hodges (1959) for a description of these classes). However, there is no theory about how to choose the best value for nclass (Hedlin 2000). This is a limit of the method.

Value

bh

A vector of the L-1 stratum boundaries proposed by the method.

nclassh

A vector for the cumulative root frequency method only: the number of classes in each stratum (Dalenius and Hodges 1959).

Nh

A vector of length L containing the population sizes N_h, i.e. the number of units in each stratum.

nh

A vector of length L containing the sample sizes n_h, i.e. the number of units to sample in each stratum. See stratification-package for information about the rounding used to get these integer values.

n

The total sample size (sum(nh)).

nhnonint

A vector of length L containing the non-integer values of the sample sizes, obtained directly from applying the allocation rule (see stratification-package).

certain.info

A vector giving statistics for the certainty stratum (see stratification-package). It contains Nc, the number of units chosen a priori to be in the sample, and meanc, the anticipated mean of Y for these units.

opti.nh

The final value of the criteria to optimize (either the total sample size n if a target CV was given or the RRMSE if a target n was given) calculated with the integer stratum sample sizes nh.

opti.nhnonint

The final value of the criteria to optimize (either the total sample size n if a target CV was given or the RRMSE if a target n was given) calculated with the non-integer stratum sample sizes nhnonint.

meanh

A vector of length L containing the anticipated means of Y in each stratum.

varh

A vector of length L containing the anticipated variances of Y in each stratum.

mean

A numeric: the anticipated global mean value of Y.

stderr

A numeric: the standard error of the anticipated global mean of Y.

CV

The anticipated coefficient of variation for the mean of Y, i.e. stderr divided mean.

stratumID

A factor, having the same length as the input x, which values are either 1, 2, ..., L or "certain". The value "certain" is given to units a priori chosen to be in the sample. This factor identifies, for each observation, the stratum to which it has been assigned.

takeall

The number of take-all strata in the final solution. Note: It is possible that n_h=N_h for non take-all strata because the condition for an automatic addition of a take-all stratum is n_h>N_h.

call

The function call (object of class "call").

date

A character string that contains the system date and time when the function ended.

args

A list of all the argument values input to the function or set by default.

Author(s)

Sophie Baillargeon Sophie.Baillargeon@mat.ulaval.ca and
Louis-Paul Rivest Louis-Paul.Rivest@mat.ulaval.ca

References

Baillargeon, S. and Rivest L.-P. (2011). The construction of stratified designs in R with the package stratification. Survey Methodology, 37(1), 53-65.

Dalenius, T. and Hodges, J.L., Jr. (1959). Minimum variance stratification. Journal of the American Statistical Association, 54, 88-101.

Gunning, P. and Horgan, J.M. (2004). A new algorithm for the construction of stratum boundaries in skewed populations. Survey Methodology, 30(2), 159-166.

Hedlin, D. (2000). A procedure for stratification by an extended Ekman rule. Journal of Official Statistics, 61, 15-29.

See Also

print.strata, plot.strata, strata.LH

Examples

### Example for strata.cumrootf
res <- matrix(NA, nrow=20, ncol=2)
i <- 1
for ( n in seq(100,2000,100)){
    cum <- strata.cumrootf(x=MRTS, CV=0.01, Ls=4, alloc=c(0.5,0,0.5), nclass=n)
    res[i,] <- c(n,cum$n)
    i <- i + 1
}
plot(res, ylab="suggested sample size n", xlab="number of classes", main=expression(
     paste("Example of the effect of nclass on n for the cum",sqrt(f)," method")))

### Example for strata.geo
strata.geo(x=Sweden$REV84, CV=0.05, Ls=5, alloc=c(0.35,0.35,0), model="none")
strata.geo(x=Sweden$REV84, CV=0.05, Ls=5, alloc=c(0.35,0.35,0), model="loglinear",
       model.control=list(beta=1.058355, sig2=0.06593083, ph=1))
strata.geo(x=Sweden$REV84, CV=0.05, Ls=5, alloc=c(0.35,0.35,0), rh=0.85,
       model="loglinear", model.control=list(beta=1.058355, sig2=0.06593083, ph=1))
# When non-response or a model is added, the stratum boundaries do not change, 
# only the nh's do.

### Exemple of how a certainty stratum can be usefull with these methods
strata.cumrootf(x=Sweden$REV84, CV=0.05, Ls=4, alloc=c(0.35,0.35,0), model="none",
                nclass=50)
strata.cumrootf(x=sort(Sweden$REV84), CV=0.05, Ls=4, alloc=c(0.35,0.35,0), 
                certain=282:284, model="none", nclass=50)
# The certainty stratum is used here to ensure that the three large units in the
# Sweden$REV84 population are in the sample, since no take-all stratum can be forced 
# in the stratified design with the cumulative root frequency or geometric method.
# We see that this allows to reduce by more than half the suggested sample size n
# (47 vs 19). This example was presented in Baillargeon and Rivest (2011). 


[Package stratification version 2.2-7 Index]