R: Generate a finite population and take an informative single...

gen_informative_sample {growfunctions}

R Documentation

Generate a finite population and take an informative single or two-stage sample.

Description

Used to compare performance of sample design-weighted and unweighted estimation procedures.

Usage

gen_informative_sample(
  clustering = TRUE,
  two_stage = FALSE,
  theta = c(0.2, 0.7, 1),
  M = 3,
  theta_star = matrix(c(0.3, 0.3, 0.3, 0.31, 0.72, 2.04, 0.58, 0.83, 1), 3, 3, byrow =
    TRUE),
  gp_type = "rq",
  N = 10000,
  T = 15,
  L = 10,
  R = 8,
  I = 4,
  n = 750,
  noise_to_signal = 0.05,
  incl_gradient = "medium"
)

Arguments

`clustering`	Boolean input on whether want population generated from clusters of covariance parameters. Defaults to `clustering = FALSE`
`two_stage`	Boolean input on whether want two stage sampling, with first stage defining set of `L` blocks, where membership in blocks determined by quantiles of observation unit variance functions. (They are structured like strata, though they are sub-sampled).
`theta`	A numeric vector of global covariance parameters in the case of `clustering = FALSE`. The length, `P`, of `theta` must be consistent with the selected `gp_type`. Defaults to `theta = c(0.30.7,1.0)` in the case of `clustering = FALSE`.
`M`	Scalar input denoting number of clusters to employ if `clustering = TRUE`. Defaults to `M = 3`
`theta_star`	An P x M matrix of cluster location values associated with the choice of `M` and the selected `gp_type`. Defaults to `matrix(c(0.3,0.3,0.3,0.31,0.72,2.04,0.58,0.83,1.00),3,3,byrow=TRUE))`.
`gp_type`	Input of choice for covariance matrix formulation to be used to generate the functions for the `N` population units. Choices are `c("se","rq")`, where `"se"` denotes the squared exponential covariance function and `"rq"` denotes the rational quadratic. Defaults to `gp_type = "se"`
`N`	A scalar input denoting the number of population units (or establishments).
`T`	A scalar input denoting the number of time points in each of `N`, T x 1 functions that contribute to the N x T population data matrix, `y`. Defaults to `T = 15`.
`L`	A scalar input that denotes the number of blocks in which to assign the population units to be sub-sampled in the first stage of sampling. Defaults to `L = 10`.
`R`	A scalar input that denotes the number of blocks to sample from `L = 10` with probability proportional to the average variance of member functions in each block.
`I`	A scalar input denoting the number of strata to form within each block. Population units are divided into equally-sized strata based on variance quantiles. Defaults to `I = 4`.
`n`	Sample size to be generated. Both an informative sample under either single (`two_stage = FALSE`) or 2-stage (`two_stage = TRUE`) sample is taken, along with a non-informative, iid sample of the same size (`n`) from the finite population (generated with (`clustering = TRUE`) or without clustering). Defaults to `n = 770`.
`noise_to_signal`	A numeric input in the interval, `(0,1)`, denoting the ratio of noise variance to the average variance of the generated functions, `bb_i`. Defaults to `noise_to_signal = 0.05`
`incl_gradient`	A character input on whether stratum probabilities from lowest-to-highest is to `"high"`, in which case they are proportional to the exponential of the cluster number. If set to `"medium"` , the inclusion probabilities are proportional to the square of the cluster number. Note that population units are assigned to each stratum proportional to a progressively increasing quantile variance. The `incl_gradient` setting is used for both `two_stage = TRUE`, in which case it is applied to strata within block, as well as `two_stage = FALSE`, in which case a simple stratified random sample is conducted. Defaults to `incl_gradient = "medium"`

Value

A list object named dat_sim containing objects related to the generated sample finite population, the informative sample and the non-informative, iid, sample. Some important objects, include:

`H`	A vector of length `N`, the population size, with cluster assignments for each establishment (unit) in `1,..M` clusters.
`map.tot`	A `data.frame` object including unit label identifiers (under `establishment`), the cluster assignment (if `clustering = TRUE`), the block (if`two_stage = TRUE`) and stratum assignments and the sample inclusion probabilities.
`map.obs`	A `data.frame` object configured the same as `map.tot`, only confined to those establishments/units selected into the informative sample of size `n`.
`map.iid`	A `data.frame` object configured the same as `map.tot`, only confined to those establishments/units selected into the non-informative, iid sample of size `n`.
`(y`, `bb)`	N x T `matrix` objects containing data responses and de-noised ' functions, respectively, for each of the `N` population units. The order of the `N` units is consistent with `map`.
`(y_obs`, `bb_obs)`	N x T `matrix` objects containing observed responses and de-noised ' functions, respectively, for each of the `n` units sampled under an informative sampling design. The order of the `n` units is consistent with `map_obs`.
`(y_iid`, `bb_iid)`	N x T `matrix` objects containing observed responses and de-noised ' functions, respectively, for each of the `n` units sampled under a non-informative / iid sampling design. The order of the `n` units is consistent with `map_iid`.

Author(s)

Terrance Savitsky tds151@gmail.com

Examples

## Not run: 
library(growfunctions)
## use gen_informative_sample() to generate an 
## N X T population drawn from a dependent GP
## By default, 3 clusters are used to generate 
## the population.
## A single stage stratified random sample of size n 
## is drawn from the population using I = 4 strata. 
## The resulting sample is informative in that the 
## distribution for this sample is
## different from the population from which 
## it was drawn because the strata inclusion
## probabilities are proportional to a feature 
## of the response, y (in the case, the variance.
## The stratified random sample over-samples 
## large variance strata).
## (The user may also select a 2-stage 
## sample with the first stage
## sampling "blocks" of the population and 
## the second stage sampling strata within blocks). 
dat_sim        <- gen_informative_sample(N = 10000, 
                                n = 500, T = 10,
                                noise_to_signal = 0.1)

## extract n x T observed sample under informative
## stratified sampling design.
y_obs                       <- dat_sim$y_obs
T                           <- ncol(y_obs)

## End(Not run)

[Package growfunctions version 0.16 Index]