R: Compute the total sample size for a stratified, simple random...

nEdge {PracTools}

R Documentation

Compute the total sample size for a stratified, simple random sample based on an Edgeworth approximation

Description

Compute the total stratified, simple random sample size for various allocations that is large enough to insure adequate coverage of a normal approximation confidence interval (CI) for a population mean.

Usage

nEdge(ci.lev, side, epsilon = 0.005, dat, pop.sw = TRUE, wts = NULL, hcol=NULL, ycol,
      alloc = NULL, Ch = NULL)

Arguments

`ci.lev`	desired confidence level for a 1- or 2-sided normal approximation confidence interval based on an estimated mean; must be in the interval (0,1)
`side`	either `"two.sided"` or `"one.sided"` for type of confidence interval
`epsilon`	tolerance on coverage probability; the sample should be large enough that CI coverage is within `\pm` `epsilon` of `ci.lev`; must be in the interval (0,1)
`dat`	either a population or sample data frame
`pop.sw`	TRUE if `dat` is for a full population; FALSE if `dat` is for a sample
`wts`	vector of weights if `dat` is a sample; if `dat` is for a population, `wts = NULL`
`hcol`	column of `dat` that contains the stratum ID; strata can be character or numeric
`ycol`	column of `dat` that contains the analysis variable; must be numeric
`alloc`	allocation to the strata; must be one of `prop`, `equal`, `neyman`, `totcost`, `totvar`, or `NULL`
`Ch`	vector of costs per unit in each stratum; these exclude fixed costs that do not vary with the sample size

Details

nEdge computes the total sample size needed in either a stratified or unstratified simple random sample so that the coverage probability of a confidence interval is within a specified tolerance (epsilon) of a nominal confidence level (ci.lev). The calculation assumes that there is a single estimated mean or total of the variable ycol that is of key importance in a sample. Confidence intervals for the finite population mean are usually computed using the normal approximation whose accuracy depends on the underlying structure of the analytic variable and the total sample size. In some applications, assuring that CIs have near nominal coverage is critical. For example, for some items on business tax returns the US Internal Revenue Service allows sample estimates to be used but sets precision standards based on the lower (or upper) limit of a 1-sided CI.

Using an Edgeworth approximation to the distribution of the estimated overall mean in Qing & Valliant (2024), nEdge computes the total sample size needed so that a CI will have coverage equal to the nominal value in ci.lev plus or minus the tolerance epsilon. The calculation assumes that the sampling fraction in each stratum is negligible. The total sample size returned by nEdge is based on the overall Edgeworth criterion; the resulting stratum sample sizes may not be large enough so that the normal approximation is adequate for each stratum estimator. When dat is a sample, the weights (wts) used in the estimator of the mean (or total) are assumed to be scaled for estimating population totals. They can be inverse selection probabilities, i.e. ones used in the \pi-estimator, or weights that have been adjusted to account for nonresponse or coverage errors.

The remainder term in the approximation used in nEdge is O(n^{-1/2}). In contrast, the function nEdgeSRS uses a O(n^{-1}) approximation but applies only to simple random sampling.

Value

List with values:

`CI type`	one-sided or two-sided
`epsilon`	tolerance on CI coverage
`Total sample size`	numeric sample size
`allocation`	type of allocation to strata or NULL if no strata are used
`Stratum values`	Data frame with columns for stratum, number of sample units allocated to each stratum (`nh`), proportion of sample allocated to each stratum (`ph`), and skewness in each stratum (`g1h`); if no strata are used, only `g1`, the overall skewness is returned

Author(s)

Richard Valliant, Siyu Qing

References

Qing, S. and Valliant, R. (2024). Extending Cochran's Sample Size Rule to Stratified Simple Random Sampling with Applications to Audit Sampling. Journal of Official Statistics, accepted.

U.S. Internal Revenue Service (2011). 26 CFR 601.105: Examination of returns and claims for refund, credit or abatement: determination of correct tax liability. Washington DC. https://www.irs.gov/pub/irs-drop/rp-11-42.pdf

Examples

require(PracTools)
set.seed(1289129963)
pop <- HMT(N=10000, H=5)
    # run for full population
nEdge(ci.lev=0.95, side="one.sided", dat=pop, pop.sw=TRUE, wts=NULL, hcol="strat", ycol="y",
       alloc="neyman")
    # run for a stratified sample
require(sampling)
sam <- strata(data=pop, stratanames="strat", size=c(30, 40, 50, 60, 70), method=c("srswor"),
              description=TRUE)
samdat <- pop[sam$ID_unit,]
w = 1/sam$Prob
nEdge(ci.lev=0.95, side="two.sided", epsilon=0.02, dat=samdat, pop.sw=FALSE, wts=w,
       hcol="strat", ycol="y", alloc="equal")

[Package PracTools version 1.5 Index]