svyglm {survey}  R Documentation 
Fit a generalised linear model to data from a complex survey design, with inverseprobability weighting and designbased standard errors.
## S3 method for class 'survey.design' svyglm(formula, design, subset=NULL, family=stats::gaussian(),start=NULL, rescale=TRUE, ..., deff=FALSE, influence=FALSE) ## S3 method for class 'svyrep.design' svyglm(formula, design, subset=NULL, family=stats::gaussian(),start=NULL, rescale=NULL, ..., rho=NULL, return.replicates=FALSE, na.action,multicore=getOption("survey.multicore")) ## S3 method for class 'svyglm' summary(object, correlation = FALSE, df.resid=NULL, ...) ## S3 method for class 'svyglm' predict(object,newdata=NULL,total=NULL, type=c("link","response","terms"), se.fit=(type != "terms"),vcov=FALSE,...) ## S3 method for class 'svrepglm' predict(object,newdata=NULL,total=NULL, type=c("link","response","terms"), se.fit=(type != "terms"),vcov=FALSE, return.replicates=!is.null(object$replicates),...)
formula 
Model formula 
design 
Survey design from 
subset 
Expression to select a subpopulation 
family 

start 
Starting values for the coefficients (needed for some uncommon link/family combinations) 
rescale 
Rescaling of weights, to improve numerical stability. The default
rescales weights to sum to the sample size. Use 
... 
Other arguments passed to 
rho 
For replicate BRR designs, to specify the parameter for
Fay's variance method, giving weights of 
return.replicates 
Return the replicates as the 
deff 
Estimate the design effects 
influence 
Return influence functions 
object 
A 
correlation 
Include the correlation matrix of parameters? 
na.action 
Handling of NAs 
multicore 
Use the 
df.resid 
Optional denominator degrees of freedom for Wald tests 
newdata 
new data frame for prediction 
total 
population size when predicting population total 
type 
linear predictor ( 
se.fit 
if 
vcov 
if 
For binomial and Poisson families use family=quasibinomial()
and family=quasipoisson()
to avoid a warning about noninteger
numbers of successes. The ‘quasi’ versions of the family objects give
the same point estimates and standard errors and do not give the
warning.
If df.resid
is not specified the df for the null model is
computed by degf
and the residual df computed by
subtraction. This is recommended by Korn and Graubard and is correct
for PSUlevel covariates but is potentially very conservative for
individuallevel covariates. To get tests based on a Normal distribution
use df.resid=Inf
, and to use number of PSUsnumber of strata,
specify df.resid=degf(design)
.
Parallel processing with multicore=TRUE
is helpful only for
fairly large data sets and on computers with sufficient memory. It may
be incompatible with GUIs, although the Mac Aqua GUI appears to be safe.
predict
gives fitted values and sampling variability for specific new
values of covariates. When newdata
are the population mean it
gives the regression estimator of the mean, and when newdata
are
the population totals and total
is specified it gives the
regression estimator of the population total. Regression estimators of
mean and total can also be obtained with calibrate
.
svyglm
returns an object of class svyglm
. The
predict
method returns an object of class svystat
svyglm
always returns 'modelrobust' standard errors; the
HorvitzThompsontype standard errors used everywhere in the survey
package are a generalisation of the modelrobust 'sandwich' estimators.
In particular, a quasiPoisson svyglm
will return correct
standard errors for relative risk regression models.
This function does not return the same standard error estimates for the regression estimator of population mean and total as some textbooks, or SAS. However, it does give the same standard error estimator as estimating the mean or total with calibrated weights.
In particular, under simple random sampling with or without replacement
there is a simple rescaling of the mean squared residual to estimate the
mean squared error of the regression estimator. The standard error
estimate produced by predict.svyglm
has very similar
(asymptotically identical) expected
value to the textbook estimate, and has the advantage of being
applicable when the supplied newdata
are not the population mean
of the predictors. The difference is small when the sample size is large, but can be
appreciable for small samples.
You can obtain the other standard error estimator by calling
predict.svyglm
with the covariates set to their estimated (rather
than true) population mean values.
Thomas Lumley
Lumley T, Scott A (2017) "Fitting Regression Models to Survey Data" Statistical Science 32: 265278
glm
, which is used to do most of the work.
regTermTest
, for multiparameter tests
calibrate
, for an alternative way to specify regression
estimators of population totals or means
svyttest
for onesample and twosample ttests.
data(api) dstrat<svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) dclus2<svydesign(id=~dnum+snum, weights=~pw, data=apiclus2) rstrat<as.svrepdesign(dstrat) rclus2<as.svrepdesign(dclus2) summary(svyglm(api00~ell+meals+mobility, design=dstrat)) summary(svyglm(api00~ell+meals+mobility, design=dclus2)) summary(svyglm(api00~ell+meals+mobility, design=rstrat)) summary(svyglm(api00~ell+meals+mobility, design=rclus2)) ## use quasibinomial, quasipoisson to avoid warning messages summary(svyglm(sch.wide~ell+meals+mobility, design=dstrat, family=quasibinomial())) ## Compare regression and ratio estimation of totals api.ratio < svyratio(~api.stu,~enroll, design=dstrat) pop<data.frame(enroll=sum(apipop$enroll, na.rm=TRUE)) npop < nrow(apipop) predict(api.ratio, pop$enroll) ## regression estimator is less efficient api.reg < svyglm(api.stu~enroll, design=dstrat) predict(api.reg, newdata=pop, total=npop) ## same as calibration estimator svytotal(~api.stu, calibrate(dstrat, ~enroll, pop=c(npop, pop$enroll))) ## svyglm can also reproduce the ratio estimator api.reg2 < svyglm(api.stu~enroll1, design=dstrat, family=quasi(link="identity",var="mu")) predict(api.reg2, newdata=pop, total=npop) ## higher efficiency by modelling variance better api.reg3 < svyglm(api.stu~enroll1, design=dstrat, family=quasi(link="identity",var="mu^3")) predict(api.reg3, newdata=pop, total=npop) ## true value sum(apipop$api.stu)