geo_ratios {sae2} | R Documentation |
Compute rates or ratios for a set of geographic entities over a set of years
Description
The function computes rates or ratios by a geographic code and
the variable Year
. If designvars
is specified, the function
also returns a data frame with linear substitutes to compute Taylor
series variances.
Usage
geo_ratios(data, geocode, numerators, denominators, geonames,
new.names, designvars)
Arguments
data |
A data frame with the required variables, including a variable
named |
geocode |
A character variable with the name of the geographic variable for which separate estimates are of interest. |
numerators |
A character vector listing the names in |
denominators |
A character vector listing the names in |
geonames |
An optional data frame containing |
new.names |
An optional character vector of the same length as
|
designvars |
Optional. If given, a character vector naming one or more
survey design variables in |
Details
For programming simplicity, the function enforces the requirement that names
should not be repeated in either numerators
or new.names
. Names
may be repeated in denominators
.
Rather than a typical survey file, the function expects the data frame
data
to contain weighted estimates for each analytic variable. As a
simple example, to find the variance of the weighted mean of y
with
weights w
, data
should contain w
and
y * w
. For convenience, the weighted estimates can still be
assigned their original names in data
, such as y
. In this
case,
numerators = y, denominators=w
would create the appropriate linear substitutes for the variance of the weighted mean.
This design of the function allows complex possibilities, such as estimating the variance of a rate where the numerator is based on one weight and the denominator is based on another. For example, estimation for the National Crime Victimization Survey requires this capability.
Value
If designvars
is not specified, a named list with one element,
a data frame containing the ratios sorted by geocode
and Year
.
If designvars
is specified, a second element is added
to the list, a data frame giving the totals of the linear
substitutes by Year
, geocode
, and
designvars
. The elements of the list are named
estimates
and linear.subs
.
Author(s)
Robert E. Fay
References
- Woodruff, R.S. (1971). A simple method for approximating the variance of a complex estimate. Journal of the American Statistical Association 66, 411-414.
See Also
vcovgen
Examples
require(survey)
require(MASS)
D <- 20 # number of domains
T <- 5 # number of years
samp <- 16 # number of sample cases per domain
set.seed(1)
# use conditional.mean=TRUE to generate true small area values
# without sampling error
Y.list <- mvrnormSeries(D=D, T=T, rho.dyn=.9, sigma.v.dyn=1,
sigma.u.dyn=.19, sigma.e=diag(5), conditional.mean=TRUE)
# generate sampling errors
e <- rnorm(samp * T * D, mean=0, sd=4)
Y <- Y.list[[2]] + tapply(e, rep(1:100, each=16), mean)
data <- data.frame(Y=Y, X=rep(1:T, times=D))
# model fit with the true sampling variances
result.dyn <- eblupDyn(Y ~ X, D, T, vardir = diag(100), data=data)
# individual level observations consistent with Y
Y2 <- rep(Y.list[[2]], each=16) + e
data2 <- data.frame(Y=Y2, X=rep(rep(1:T, each=samp), times=D),
Year=rep(rep(1:T, each=samp), times=D),
weight=rep(1, times=samp*T*D),
d=rep(1:D, each=samp*T),
strata=rep(1:(D*T), each=samp),
ids=1:(D*T*samp))
# geo_ratios with designvars specified
geo.results <- geo_ratios(data2, geocode="d", numerators="Y",
denominators="weight",
designvars=c("strata", "ids"))
# illustrative check
max(abs(geo.results[[1]]$Y - Y))
vcov.list <- vcovgen(geo.results[[2]], year.list=1:5, geocode="d",
designvars=c("strata", "ids"))
vcov.list[[1]]
# model fitted with directly estimated variance-covariances
result2.dyn <- eblupDyn(Y ~ X, D, T, vardir=vcov.list, data=data)
cor(result.dyn$eblup, result2.dyn$eblup)