R: Permanent Random Number Sampling

prnsamplr-package {prnsamplr}

R Documentation

Permanent Random Number Sampling

Description

Survey sampling using permanent random numbers (PRN's). A solution to the problem of unknown overlap between survey samples, which leads to a low precision in estimates when the survey is repeated or combined with other surveys. The PRN solution is to supply the U(0, 1) random numbers to the sampling procedure, instead of having the sampling procedure generate them. In Lindblom (2014) <doi:10.2478/jos-2014-0047>, and therein cited articles, it is shown how this is carried out and how it improves the estimates. This package supports two common fixed-size sampling procedures (simple random sampling and probability-proportional-to-size sampling) and includes a function for transforming the PRN's in order to control the sample overlap.

Details

This package provides two functions for drawing stratified PRN-assisted samples: srs and pps. The former – simple random sampling – assumes that each unit k in a given stratum h is equally likely to be sampled, with inclusion probability

\pi_k = \frac{n_h}{N_h}

for each stratum h. The function then samples the n_h elements with the smallest PRN's, for each stratum h.

The latter – Pareto \pi ps sampling – assumes that large units are more likely to be sampled than small units. The function approximates this unknown inclusion probability as

\lambda_k = n_h \frac{x_k}{\sum_{i=1}^{n_h} x_i},

where x_k is a size measure, and samples the n_h elements with the smallest values of

Q_k = \frac{PRN_k(1 - \lambda_k)}{\lambda_k(1 - PRN_k)},

for each stratum h.

These two functions can be run standalone or via the wrapper function samp. Input to the functions is the sampling frame, stratification information and PRN's given as variables on the frame, and in the case for pps also a size measure given as variable on the frame. Output is a copy of the sampling frame containing sampling information, and in the case for pps also containing \lambda and Q.

Provided is also a function transformprn via which it is possible to select where to start counting and in which direction when enumerating the PRN's in the sampling routines. This is done by specifying start and direction to transformprn and then calling srs or pps on its output.

Finally, an example dataset is provided that can be used to illustrate the functionality of the package.

Author(s)

Kira Coder Gylling

Maintainer: Kira Coder Gylling <kira.gylling@gmail.com>

References

Lindblom, A. (2014). "On Precision in Estimates of Change over Time where Samples are Positively Coordinated by Permanent Random Numbers." Journal of Official Statistics, vol.30, no.4, 2014, pp.773-785. https://doi.org/10.2478/jos-2014-0047.

Examples

dfSRS <- srs(df=ExampleData, 
             nsamp="nsample", 
             stratid="stratum", 
             prn="rands")

dfPPS <- pps(df=ExampleData, 
             nsamp="nsample", 
             stratid="stratum", 
             prn="rands", 
             size="sizeM")

dfPRN <- transformprn(df=ExampleData, 
                      prn="rands", 
                      direction="U", 
                      start=0.2)