cat_analysis {spsurvey} | R Documentation |
Categorical variable analysis
Description
This function organizes input and output for the analysis of categorical variables. The analysis data,
dframe
, can be either a data frame or a simple features (sf
) object. If an
sf
object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord
and ycoord
are assigned values
"xcoord"
and "ycoord"
, respectively, and the geometry column is
dropped from the object.
Usage
cat_analysis(
dframe,
vars,
subpops = NULL,
siteID = NULL,
weight = "weight",
xcoord = NULL,
ycoord = NULL,
stratumID = NULL,
clusterID = NULL,
weight1 = NULL,
xcoord1 = NULL,
ycoord1 = NULL,
sizeweight = FALSE,
sweight = NULL,
sweight1 = NULL,
fpc = NULL,
popsize = NULL,
vartype = "Local",
jointprob = "overton",
conf = 95,
All_Sites = FALSE
)
Arguments
dframe |
Data to be analyzed (analysis data). A data frame or
|
vars |
Vector composed of character values that identify the
names of response variables in |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in |
siteID |
Character value providing name of the site ID variable in
the |
weight |
Character value providing name of the design weight
variable in |
xcoord |
Character value providing name of the x-coordinate variable in
the |
ycoord |
Character value providing name of the y-coordinate variable in
the |
stratumID |
Character value providing name of the stratum ID variable in
the |
clusterID |
Character value providing the name of the cluster
(stage one) ID variable in |
weight1 |
Character value providing name of the stage one weight
variable in |
xcoord1 |
Character value providing the name of the stage one
x-coordinate variable in |
ycoord1 |
Character value providing the name of the stage one
y-coordinate variable in |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing the name of the size weight variable
in |
sweight1 |
Character value providing name of the stage one size weight
variable in |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
vartype |
Character value providing the choice of the variance
estimator, where |
jointprob |
Character value providing the choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where |
conf |
Numeric value providing the Gaussian-based confidence level. The default value
is |
All_Sites |
A logical variable used when |
Value
The analysis results. A data frame of population estimates for all combinations of subpopulations, categories within each subpopulation, response variables, and categories within each response variable. Estimates are provided for proportion and total of the population plus standard error, margin of error, and confidence interval estimates. The data frame contains the following variables:
- Type
subpopulation (domain) name
- Subpopulation
subpopulation name within a domain
- Indicator
response variable
- Category
category of response variable
- nResp
sample size
- Estimate.P
proportion estimate (in %)
- StdError.P
standard error of proportion estimate
- MarginofError.P
margin of error of proportion estimate
- LCBxxPct.P
xx% (default 95%) lower confidence bound of proportion estimate
- UCBxxPct.P
xx% (default 95%) upper confidence bound of proportion estimate
- Estimate.U
total estimate
- StdError.U
standard error of total estimate
- MarginofError.U
margin of error of total estimate
- LCBxxPct.U
xx% (default 95%) lower confidence bound of total estimate
- UCBxxPct.U
xx% (default 95%) upper confidence bound of total estimate
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
cont_analysis
for continuous variable analysis
Examples
dframe <- data.frame(
siteID = paste0("Site", 1:100),
wgt = runif(100, 10, 100),
xcoord = runif(100),
ycoord = runif(100),
stratum = rep(c("Stratum1", "Stratum2"), 50),
CatVar = rep(c("north", "south", "east", "west"), 25),
All_Sites = rep("All Sites", 100),
Resource_Class = rep(c("Good", "Poor"), c(55, 45))
)
myvars <- c("CatVar")
mysubpops <- c("All_Sites", "Resource_Class")
mypopsize <- data.frame(
Resource_Class = c("Good", "Poor"),
Total = c(4000, 1500)
)
cat_analysis(dframe,
vars = myvars, subpops = mysubpops, siteID = "siteID",
weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
stratumID = "stratum", popsize = mypopsize
)