R: Compute the total sample size for a simple random sample...

nEdgeSRS {PracTools}

R Documentation

Compute the total sample size for a simple random sample based on an Edgeworth approximation

Description

Compute the total simple random sample size that is large enough to insure adequate coverage of a normal approximation confidence interval (CI) for a population mean.

Usage

nEdgeSRS(ci.lev, side, epsilon = 0.005, dat, pop.sw = TRUE, wts = NULL, hcol=NULL, ycol)

Arguments

`ci.lev`	desired confidence level for a 1- or 2-sided normal approximation confidence interval based on an estimated mean; must be in the interval (0,1)
`side`	either `"two.sided"` or `"one.sided"` for type of confidence interval
`epsilon`	tolerance on coverage probability; the sample should be large enough that CI coverage is within `\pm` `epsilon` of `ci.lev`; must be in the interval (0,1)
`dat`	either a population or sample data frame
`pop.sw`	TRUE if `dat` is for a full population; FALSE if `dat` is for a sample
`wts`	vector of weights if `dat` is a sample; if `dat` is for a population, `wts = NULL`
`hcol`	column of `dat` that contains the stratum ID; strata can be character or numeric
`ycol`	column of `dat` that contains the analysis variable; must be numeric

Details

nEdgeSRS computes the total sample size needed in a simple random sample so that the coverage probability of a confidence interval is within a specified tolerance (epsilon) of a nominal confidence level (ci.lev). Confidence intervals for the finite population mean are usually computed using the normal approximation whose accuracy depends on the sample size and the underlying structure of the analytic variable. In some applications, assuring that CIs have near nominal coverage is critical. For example, for some items on business tax returns the US Internal Revenue Service allows sample estimates to be used but sets precision standards based on the lower (or upper) limit of a 1-sided CI.

Using an Edgeworth approximation in Sugden, Smith, and Jones (SSJ, 2000) to the distribution of the estimated mean, nEdgeSRS computes the total sample size needed so that a CI will have coverage equal to the nominal value in ci.lev plus or minus the tolerance epsilon. Two alternatives are given: (1) a sample size from solving quadratic equation (4.4) in SSJ and (2) a modification of a rule from Cochran (1977) given in expression (4.3) of SSJ. If hcol is specified, a separate calculation is made in each stratum of the required stratum simple random sample size; thus, each stratum sample size should be adequate so that the normal approximation for each stratum estimator holds. The calculation assumes that the overall or stratum sampling fractions are negligible.

When dat is a sample, the weights (wts) used in the estimator of the mean (or total) are assumed to be scaled for estimating population totals. They can be inverse selection probabilities, i.e. ones used in the \pi-estimator, or weights that have been adjusted to account for nonresponse or coverage errors.

The remainder term in the approximation used in nEdgeSRS is O(n^{-1}). In contrast, the function nEdge uses a O(n^{-1/2}) approximation but applies to an overall mean from a stratified simple random sample for which several different allocations can be specified. The total sample size returned by nEdge is based on the overall Edgeworth approximation for the distribution of the population mean estimator; the resulting stratum sample sizes may not be large enough so that the normal approximation is adequate for each stratum estimator.

Value

List with values:

`CI type`	one-sided or two-sided
`epsilon`	tolerance on CI coverage
`Total sample size`	vector of numeric sample sizes from (1) solving SSJ (2000) quadratic equation and (2) SSJ's modified Cochran rule
`g1`	overall skewness and kurtosis; returned if no strata are used
`Stratum values`	data frame with columns for stratum, number of sample units allocated to each stratum (`nh`) based on the SSJ quadratic rule, proportion that each quadratic-rule stratum sample is of the total sample (`ph`), modified Cochran sample size (`nh.cochran`), skewness in each stratum (`stratum.skewness`), and kurtosis in each stratum (`stratum.kurtosis`); returned if strata are used

Author(s)

Richard Valliant

References

Cochran, W.G. (1977). Sampling Techniques, 3rd edition. New York: Wiley.

Sugden, R. A., Smith, T. M. F., and Jones, R. P. (2000). Cochran's Rule for Simple Random Sampling. Journal of the Royal Statistical Society. Series B, Vol. 62, No.4, 787-793. doi:https://doi.org/10.1111/1467-9868.00264

U.S. Internal Revenue Service (2011). 26 CFR 601.105: Examination of returns and claims for refund, credit or abatement: determination of correct tax liability. Washington DC. https://www.irs.gov/pub/irs-drop/rp-11-42.pdf

Examples

require(PracTools)
#   test using HMT pop
require(PracTools)
set.seed(1289129963)
pop <- HMT(N=10000, H=5)
    # using pop with no strata
nEdgeSRS(ci.lev=0.95, side="one.sided", dat=pop, pop.sw=TRUE, hcol=NULL, ycol="y")
    # using a sample as input
require(sampling)
sam <- strata(data=pop, stratanames="strat", size=c(30, 40, 50, 60, 70), method=c("srswor"),
              description=TRUE)
samdat <- pop[sam$ID_unit,]
w = 1/sam$Prob
nEdgeSRS(ci.lev=0.95, side="one.sided", epsilon=0.005, dat=samdat, pop.sw=FALSE, wts=w,
         hcol="strat", ycol="y")

[Package PracTools version 1.5 Index]