sample_strata {optimall} | R Documentation |
Select Sampling Units based on Stratified Random Sampling
Description
Requires two dataframes or matrices: data
with a column
strata
which specifies stratum membership for each unit in
the population and a second dataframe design_data
with one
row per strata level with a column design_strata
that
indicates the unique levels of strata
in data
and
n_allocated
that specifies the
number to be sampled from each stratum.
sample_strata
selects the units to sample by
selecting a random sample of the desired size within each
stratum. The second dataframe can be the output of allocate_wave()
or optimum_allocation()
.
Usage
sample_strata(
data,
strata,
id,
already_sampled = NULL,
design_data,
design_strata = "strata",
n_allocated = "n_to_sample",
probs = NULL,
wave = NULL,
warn_prob_overwrite = TRUE
)
Arguments
data |
A data frame or matrix with one row for each sampling unit in the population, one column specifying each unit's stratum, and one column with a unique identifier for each unit. |
strata |
a character string specifying the name of column
in |
id |
a character string specifying the name of the column
in |
already_sampled |
a character sting specifying the name of the
column in |
design_data |
a dataframe or matrix with one row for each stratum that subdivides the population, one column specifying the stratum name, and one column indicating the number of samples allocated to each stratum. |
design_strata |
a character string specifying the name of the
column in |
n_allocated |
a character string specifying the name of the
column in |
probs |
a character string specifying the name of the column in
in |
wave |
A numeric value or character string indicating the
sampling wave. If specified, the input is appended to
"sample_indicator" in the new the sample indicator column name
(as long as such columns name do not already exist in |
warn_prob_overwrite |
Logical indicator for whether warning should
be printed if |
Value
returns data
as a dataframe with a new column named
"sample_indicator" containing a binary (1/0) indicator of
whether each unit should be sampled. If wave
argument is
specified, then the given input is appended to the name "sample_indicator".
If probs
argument is specified, then the dataframe will also contain
a new column named "sampling_prob" holding the sampling probabilities for
each sampled element.
Examples
# Define a design dataframe
design <- data.frame(
strata = c("setosa", "virginica", "versicolor"),
npop = c(50, 50, 50),
n_to_sample = c(5, 5, 5)
)
# Make sure there is an id column
iris$id <- 1:nrow(iris)
# Run
sample_strata(
data = iris, strata = "Species", id = "id",
design_data = design, design_strata = "strata",
n_allocated = "n_to_sample"
)
# To include probs as a formula
sample_strata(
data = iris, strata = "Species", id = "id",
design_data = design, design_strata = "strata",
n_allocated = "n_to_sample", probs = ~n_to_sample/npop
)
# If some units had already been sampled
iris$already_sampled <- rbinom(nrow(iris), 1, 0.25)
sample_strata(
data = iris, strata = "Species", id = "id",
already_sampled = "already_sampled",
design_data = design, design_strata = "strata",
n_allocated = "n_to_sample"
)