R: Change analysis

change_analysis {spsurvey}

R Documentation

Change analysis

Description

This function organizes input and output for the estimation of change between two samples (for categorical and continuous variables). The analysis data, dframe, can be either a data frame or a simple features (sf) object. If an sf object is used, coordinates are extracted from the geometry column in the object, arguments xcoord and ycoord are assigned values "xcoord" and "ycoord", respectively, and the geometry column is dropped from the object.

Usage

change_analysis(
  dframe,
  vars_cat = NULL,
  vars_cont = NULL,
  test = "mean",
  subpops = NULL,
  surveyID = "surveyID",
  survey_names = NULL,
  siteID = "siteID",
  weight = "weight",
  revisitwgt = FALSE,
  xcoord = NULL,
  ycoord = NULL,
  stratumID = NULL,
  clusterID = NULL,
  weight1 = NULL,
  xcoord1 = NULL,
  ycoord1 = NULL,
  sizeweight = FALSE,
  sweight = NULL,
  sweight1 = NULL,
  fpc = NULL,
  popsize = NULL,
  vartype = "Local",
  jointprob = "overton",
  conf = 95,
  All_Sites = FALSE
)

Arguments

`dframe`	Data to be analyzed (analysis data). A data frame or `sf` object containing survey design variables, response variables, and subpopulation (domain) variables.
`vars_cat`	Vector composed of character values that identify the names of categorical response variables in `dframe`. The default is `NULL`.
`vars_cont`	Vector composed of character values that identify the names of continuous response variables in `dframe`. The default is `NULL`.
`test`	Character string or character vector providing the location measure(s) to use for change estimation for continuous variables. The choices are `"mean"`, `"total"`, `"median"`, or some combination of the three options (e.g., `c("mean", "total")`). The default is `"mean"`.
`subpops`	Vector composed of character values that identify the names of subpopulation (domain) variables in `dframe`. If a value is not provided, the value `"All_Sites"` is assigned to the subpops argument and a factor variable named `"All_Sites"` that takes the value `"All Sites"` is added to `dframe`. The default value is `NULL`.
`surveyID`	Character value providing name of the survey ID variable in `dframe`. The default value is `"surveyID"`.
`survey_names`	Character vector of length two that provides the survey names contained in the `surveyID` variable in the `dframe` data frame. The two values in the vector identify the first survey and second survey, respectively. If a value is not provided, unique values of the `surveyID` variable are assigned to the `survey_names` argument. The default is `NULL`.
`siteID`	Character value providing name of the site ID variable in `dframe`. For a two-stage sample, the site ID variable identifies stage two site IDs. The default value is `"siteID"`. If a unique site is visited in both surveys, the corresponding `siteID` should be the same for both entries.
`weight`	Character value providing name of the design weight variable in `dframe`. For a two-stage sample, the weight variable identifies stage two weights. The default value is `"weight"`.
`revisitwgt`	Logical value that indicates whether each repeat visit site has the same design weight in the two surveys, where `TRUE` = the weight for each repeat visit site is the same and `FALSE` = the weight for each repeat visit site is not the same. When this argument is `FALSE`, all of the repeat visit sites are assigned equal weights when calculating the covariance component of the change estimate standard error. The default is `FALSE`.
`xcoord`	Character value providing name of the x-coordinate variable in `dframe`. For a two-stage sample, the x-coordinate variable identifies stage two x-coordinates. Note that x-coordinates are required for calculation of the local mean variance estimator. If `dframe` is an `sf` object, this argument is not required (as the geometry column in `dframe` is used to find the x-coordinate). The default value is `NULL`.
`ycoord`	Character value providing name of the y-coordinate variable in `dframe`. For a two-stage sample, the y-coordinate variable identifies stage two y-coordinates. Note that y-coordinates are required for calculation of the local mean variance estimator. If `dframe` is an `sf` object, this argument is not required (as the geometry column in `dframe` is used to find the y-coordinate). The default value is `NULL`.
`stratumID`	Character value providing name of the stratum ID variable in `dframe`. The default value is `NULL`.
`clusterID`	Character value providing the name of the cluster (stage one) ID variable in `dframe`. Note that cluster IDs are required for a two-stage sample. The default value is `NULL`.
`weight1`	Character value providing name of the stage one weight variable in `dframe`. The default value is `NULL`.
`xcoord1`	Character value providing the name of the stage one x-coordinate variable in `dframe`. Note that x coordinates are required for calculation of the local mean variance estimator. The default value is `NULL`.
`ycoord1`	Character value providing the name of the stage one y-coordinate variable in `dframe`. Note that y-coordinates are required for calculation of the local mean variance estimator. The default value is `NULL`.
`sizeweight`	Logical value that indicates whether size weights should be used during estimation, where `TRUE` uses size weights and `FALSE` does not use size weights. To employ size weights for a single-stage sample, a value must be supplied for argument weight. To employ size weights for a two-stage sample, values must be supplied for arguments `weight` and `weight1`. The default value is `FALSE`.
`sweight`	Character value providing the name of the size weight variable in `dframe`. For a two-stage sample, the size weight variable identifies stage two size weights. The default value is `NULL`.
`sweight1`	Character value providing name of the stage one size weight variable in `dframe`. The default value is `NULL`.
`fpc`	Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design: `⁠fpc <- 15000⁠` Example fpc for a single-stage stratified survey design: `⁠fpc <- list( Stratum_1 = 9000, Stratum_2 = 6000) ⁠` Example fpc for a two-stage unstratified survey design: `⁠fpc <- c( Ncluster = 150, Cluster_1 = 150, Cluster_2 = 75, Cluster_3 = 75, Cluster_4 = 125, Cluster_5 = 75) ⁠` Example fpc for a two-stage stratified survey design: `⁠fpc <- list( Stratum_1 = c( Ncluster_1 = 100, Cluster_1 = 125, Cluster_2 = 100, Cluster_3 = 100, Cluster_4 = 125, Cluster_5 = 50), Stratum_2 = c( Ncluster_2 = 50, Cluster_1 = 75, Cluster_2 = 150, Cluster_3 = 75, Cluster_4 = 75, Cluster_5 = 125)) ⁠`
`popsize`	Object that provides values for the population argument of the `calibrate` or `postStratify` functions in the survey package. If a value is provided for popsize, then either the `calibrate` or `postStratify` function is used to modify the survey design object that is required by functions in the survey package. Whether to use the `calibrate` or `postStratify` function is dictated by the format of popsize, which is discussed below. Post-stratification adjusts the sampling and replicate weights so that the joint distribution of a set of post-stratifying variables matches the known population joint distribution. Calibration, generalized raking, or GREG estimators generalize post-stratification and raking by calibrating a sample to the marginal totals of variables in a linear regression model. For the `calibrate` function, the object is a named list, where the names identify factor variables in `dframe`. Each element of the list is a named vector containing the population total for each level of the associated factor variable. For the `postStratify` function, the object is either a data frame, table, or xtabs object that provides the population total for all combinations of selected factor variables in the `dframe` data frame. If a data frame is used for `popsize`, the variable containing population totals must be the last variable in the data frame. If a table is used for `popsize`, the table must have named `dimnames` where the names identify factor variables in the `dframe` data frame. If the popsize argument is equal to `NULL`, then neither calibration nor post-stratification is performed. The default value is `NULL`. Example popsize for calibration: `⁠popsize <- list( Ecoregion = c( East = 750, Central = 500, West = 250), Type = c( Streams = 1150, Rivers = 350)) ⁠` Example popsize for post-stratification using a data frame: `⁠popsize <- data.frame( Ecoregion = rep(c("East", "Central", "West"), rep(2, 3)), Type = rep(c("Streams", "Rivers"), 3), Total = c(575, 175, 400, 100, 175, 75)) ⁠` Example popsize for post-stratification using a table: `⁠popsize <- with(MySurveyFrame, table(Ecoregion, Type))⁠` Example popsize for post-stratification using an xtabs object: `⁠popsize <- xtabs(~Ecoregion + Type, data = MySurveyFrame)⁠`
`vartype`	Character value providing the choice of the variance estimator, where `"Local"` indicates the local mean estimator and `"SRS"` indicates the simple random sampling estimator. The default value is `"Local"`.
`jointprob`	Character value providing the choice of joint inclusion probability approximation for use with Horvitz-Thompson and Yates-Grundy variance estimators, where `"overton"` indicates the Overton approximation, `"hr"` indicates the Hartley-Rao approximation, and `"brewer"` equals the Brewer approximation. The default value is `"overton"`.
`conf`	Numeric value providing the Gaussian-based confidence level. The default value is `95`.
`All_Sites`	A logical variable used when `subpops` is not `NULL`. If `All_Sites` is `TRUE`, then alongside the subpopulation output, output for all sites (ignoring subpopulations) is returned for each variable in `vars`. If `All_Sites` is `FALSE`, then alongside the subpopulation output, output for all sites (ignoring subpopulations) is not returned for each variable in `vars`. The default is `FALSE`.

Value

List of change estimates composed of four items: (1) catsum contains change estimates for categorical variables, (2) contsum_mean contains estimates for continuous variables using the mean, (3) contsum_total contains estimates for continuous variables using the total, and (4) contsum_median contains estimates for continuous variables using the median. The items in the list will contain NULL for estimates that were not calculated. Each data frame includes estimates for all combinations of population Types, subpopulations within types, response variables, and categories within each response variable (for categorical variables and continuous variables using the median). Change estimates are provided plus standard error estimates and confidence interval estimates.

The catsum data frame contains the following variables:

Survey_1: first survey name
Survey_2: second survey name
Type: subpopulation (domain) name
Subpopulation: subpopulation name within a domain
Indicator: response variable
Category: category of response variable
DiffEst.P: proportion difference estimate (in %; second survey - first survey)
StdError.P: standard error of proportion difference estimate
MarginofError.P: margin of error of proportion difference estimate
LCBxxPct.P: xx% (default 95%) lower confidence bound of proportion difference estimate
UCBxxPct.P: xx% (default 95%) upper confidence bound of proportion difference estimate
Estimate.U: total difference estimate (second survey - first survey)
StdError.U: standard error of total difference estimate
MarginofError.U: margin of error of total difference estimate
LCBxxPct.U: xx% (default 95%) lower confidence bound of total difference estimate
UCBxxPct.U: xx% (default 95%) upper confidence bound of total difference estimate
nResp_1: sample size in the first survey
Estimate.P_1: proportion estimate (in %) from the first survey
StdError.P_1: standard error of proportion estimate from the first survey
MarginofError.P_1: margin of error of proportion estimate from the first survey
LCBxxPct.P_1: xx% (default 95%) lower confidence bound of proportion estimate from the first survey
UCBxxPct.P_1: xx% (default 95%) upper confidence bound of proportion estimate from the first survey
nResp_2: sample size in the second survey
Estimate.U_1: total estimate from the first survey
StdError.U_1: standard error of total estimate from the first survey
MarginofError.U_1: margin of error of total estimate from the first survey
LCBxxPct.U_1: xx% (default 95%) lower confidence bound of total estimate from the first survey
UCBxxPct.U_1: xx% (default 95%) upper confidence bound of total estimate from the first survey
Estimate.P_2: proportion estimate (in %) from the second survey
StdError.P_2: standard error of proportion estimate from the second survey
MarginofError.P_2: margin of error of proportion estimate from the second survey
LCBxxPct.P_2: xx% (default 95%) lower confidence bound of proportion estimate from the second survey
UCBxxPct.P_2: xx% (default 95%) upper confidence bound of proportion estimate from the second survey
Estimate.U_2: total estimate from the second survey
StdError.U_2: standard error of total estimate from the second survey
MarginofError.U_2: margin of error of total estimate from the second survey
LCBxxPct.U_2: xx% (default 95%) lower confidence bound of total estimate from the second survey
UCBxxPct.U_2: xx% (default 95%) upper confidence bound of total estimate from the second survey

The contsum_mean data frame contains the following variables:

Survey_1: first survey name
Survey_2: second survey name
Type: subpopulation (domain) name
Subpopulation: subpopulation name within a domain
Indicator: response variable
Statistic: value of percentile
nResp: sample size at or below Value
DiffEst: mean difference estimate
StdError: standard error of mean difference estimate
MarginofError: margin of error of mean difference estimate
LCBxxPct: xx% (default 95%) lower confidence bound of mean difference estimate
UCBxxPct: xx% (default 95%) upper confidence bound of mean difference estimate
nResp_1: sample size in the first survey
Estimate_1: mean estimate from the first survey
StdError_1: standard error of mean estimate from the first survey
MarginofError_1: margin of error of mean estimate from the first survey
LCBxxPct_1: xx% (default 95%) lower confidence bound of mean estimate from the first survey
UCBxxPct_1: xx% (default 95%) upper confidence bound of mean estimate from the first survey
nResp_2: sample size in the second survey
Estimate_2: mean estimate from the second survey
StdError_2: standard error of mean estimate from the second survey
MarginofError_2: margin of error of mean estimate from the second survey
LCBxxPct_2: xx% (default 95%) lower confidence bound of mean estimate from the second survey
UCBxxPct_2: xx% (default 95%) upper confidence bound of mean estimate from the second survey

The contsum_total data frame contains the following variables:

Survey_1: first survey name
Survey_2: second survey name
Type: subpopulation (domain) name
Subpopulation: subpopulation name within a domain
Indicator: response variable
Statistic: value of percentile
nResp: sample size at or below Value
DiffEst: total difference estimate
StdError: standard error of total difference estimate
MarginofError: margin of error of total difference estimate
LCBxxPct: xx% (default 95%) lower confidence bound of total difference estimate
UCBxxPct: xx% (default 95%) upper confidence bound of total difference estimate
nResp_1: sample size in the first survey
Estimate_1: total estimate from the first survey
StdError_1: standard error of total estimate from the first survey
MarginofError_1: margin of error of total estimate from the first survey
LCBxxPct_1: xx% (default 95%) lower confidence bound of total estimate from the first survey
UCBxxPct_1: xx% (default 95%) upper confidence bound of total estimate from the first survey
nResp_2: sample size in the second survey
Estimate_2: total estimate from the second survey
StdError_2: standard error of total estimate from the second survey
MarginofError_2: margin of error of total estimate from the second survey
LCBxxPct_2: xx% (default 95%) lower confidence bound of total estimate from the second survey
UCBxxPct_2: xx% (default 95%) upper confidence bound of total estimate from the second survey

The contsum_median data frame contains the following variables:

Survey_1: first survey name
Survey_2: second survey name
Type: subpopulation (domain) name
Subpopulation: subpopulation name within a domain
Indicator: response variable
Category: category of response variable
DiffEst.P: proportion above or below median difference estimate (in %; second survey - first survey)
StdError.P: standard error of proportion above or below median difference estimate
MarginofError.P: margin of error of proportion above or below median difference estimate
LCBxxPct.P: xx% (default 95%) lower confidence bound of proportion above or below median difference estimate
UCBxxPct.P: xx% (default 95%) upper confidence bound of proportion above or below median difference estimate
Estimate.U: total above or below median difference estimate (second survey - first survey)
StdError.U: standard error of total above or below median difference estimate
MarginofError.U: margin of error of total above or below median difference estimate
LCBxxPct.U: xx% (default 95%) lower confidence bound of total above or below median difference estimate
UCBxxPct.U: xx% (default 95%) upper confidence bound of total above or below median difference estimate
nResp_1: sample size in the first survey
Estimate.P_1: proportion above or below median estimate (in %) from the first survey
StdError.P_1: standard error of proportion above or below median estimate from the first survey
MarginofError.P_1: margin of error of proportion above or below median estimate from the first survey
LCBxxPct.P_1: xx% (default 95%) lower confidence bound of proportion above or below median estimate from the first survey
UCBxxPct.P_1: xx% (default 95%) upper confidence bound of proportion above or below median estimate from the first survey
nResp_2: sample size in the second survey
Estimate.U_1: total above or below median estimate from the first survey
StdError.U_1: standard error of total above or below median estimate from the first survey
MarginofError.U_1: margin of error of total above or below median estimate from the first survey
LCBxxPct.U_1: xx% (default 95%) lower confidence bound of total above or below median estimate from the first survey
UCBxxPct.U_1: xx% (default 95%) upper confidence bound of total above or below median estimate from the first survey
Estimate.P_2: proportion above or below median estimate (in %) from the second survey
StdError.P_2: standard error of proportion above or below median estimate from the second survey
MarginofError.P_2: margin of error of proportion above or below median estimate from the second survey
LCBxxPct.P_2: xx% (default 95%) lower confidence bound of proportion above or below median estimate from the second survey
UCBxxPct.P_2: xx% (default 95%) upper confidence bound of proportion above or below median estimate from the second survey
Estimate.U_2: total above or below median estimate from the second survey
StdError.U_2: standard error of total above or below median estimate from the second survey
MarginofError.U_2: margin of error of total above or below median estimate from the second survey
LCBxxPct.U_2: xx% (default 95%) lower confidence bound of total above or below median estimate from the second survey
UCBxxPct.U_2: xx% (default 95%) upper confidence bound of total above or below median estimate from the second survey

Author(s)

Tom Kincaid Kincaid.Tom@epa.gov

Examples

# Categorical variable example for three resource classes
dframe <- data.frame(
  surveyID = rep(c("Survey 1", "Survey 2"), c(100, 100)),
  siteID = paste0("Site", 1:200),
  wgt = runif(200, 10, 100),
  xcoord = runif(200),
  ycoord = runif(200),
  stratum = rep(rep(c("Stratum 1", "Stratum 2"), c(2, 2)), 50),
  CatVar = rep(c("North", "South"), 100),
  All_Sites = rep("All Sites", 200),
  Resource_Class = sample(c("Good", "Fair", "Poor"), 200, replace = TRUE)
)
myvars <- c("CatVar")
mysubpops <- c("All_Sites", "Resource_Class")
change_analysis(dframe,
  vars_cat = myvars, subpops = mysubpops,
  surveyID = "surveyID", siteID = "siteID", weight = "wgt",
  xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum"
)

[Package spsurvey version 5.5.1 Index]