DesignSurvey {capm}R Documentation

Survey design


A wraper for svydesign function from the survey package, to define one of the following survey designs: two-stage cluster, simple (systematic) or stratified. In the first case, weights are calculated considering a sample with probability proportional to size and with replacement for the first stage and a simple random sampling for the second stage. Finite population correction is specified as the population size for each level of sampling.


DesignSurvey(sample = NULL, psu.ssu = NULL, psu.col = NULL,
  ssu.col = NULL, cal.col = NULL, N = NULL, strata = NULL,
  cal.N = NULL, ...)



data.frame with sample observations. for two-stage cluster designs, one of the columns must contain unique identifiers for PSU and another column must contain unique identifiers for Secondary Sampling Units (SSU).


data.frame with all Primary Sampling Units (PSU). First column contains PSU unique identifiers. Second column contains numeric PSU sizes. It is used only for two-stage cluster designs.


the column of sample containing the psu identifiers (for two-stage cluster designs). It is used only for two-stage cluster designs.


the column of sample containing the ssu identifiers (for two-stage cluster designs). It is used only for two-stage cluster designs.


the column of sample with the variable to calibrate estimates. It must be used together with cal.N.


for simple designs, a numeric value representing the total of sampling units in the population. for a stratified design, it is a column of sample indicating, for each observation, the total of sampling units in its respective strata. N is ignored in two-stage cluster designs.


for stratified designs, a column of sample indicating the strata memebership of each observation.


population total for the variable to calibrate the estimates. It must be used togheter with cal.col.


further arguments passed to svydesign function.


For two-stage cluster designs, a PSU appearing in both psu.ssu and in sample must have the same identifier. SSU identifiers must be unique but can appear more than once if there is more than one observation per SSU. sample argument must have just the varibles to be estimated plus the variables required to define the design (two-stage cluster or stratified). cal.col and cal.N are needed only if estimates will be calibrated. The calibration is based on a population total.


An object of class


Lumley, T. (2011). Complex surveys: A guide to analysis using R (Vol. 565). Wiley.

Baquero, O. S., Marconcin, S., Rocha, A., & Garcia, R. D. C. M. (2018). Companion animal demography and population management in Pinhais, Brazil. Preventive Veterinary Medicine.



## Calibrated two-stage cluster design
design <- DesignSurvey(na.omit(cluster_sample),
                       psu.ssu = psu_ssu,
                       psu.col = "census_tract_id",
                       ssu.col = "interview_id",
                       cal.col = "number_of_persons",
                       cal.N = 129445)

## Simple design
# If data in cluster_sample were from a simple design:
design <- DesignSurvey(na.omit(cluster_sample), 
                       N = sum(psu_ssu$hh),
                       cal.N = 129445)

## Stratified design
# Simulate strata and assume that the data in cluster_design came
# from a stratified design
cluster_sample$strat <- sample(c("urban", "rural"),
                               prob = c(.95, .05),
                               replace = TRUE)
cluster_sample$strat_size <- round(sum(psu_ssu$hh) * .95)
cluster_sample$strat_size[cluster_sample$strat == "rural"] <-
  round(sum(psu_ssu$hh) * .05)
design <- DesignSurvey(cluster_sample,
                       N = "strat_size",
                       strata = "strat",
                       cal.N = 129445)

[Package capm version 0.14.0 Index]