svydfbetas {svydiags}R Documentation

dfbetas for models fitted with complex survey data

Description

Compute the dfbetas measure of the effect of extreme observations on parameter estimates for fixed effects, linear regression models fitted with data collected from one- and two-stage complex survey designs.

Usage

svydfbetas(mobj, stvar=NULL, clvar=NULL, z=3)

Arguments

mobj

model object produced by svyglm in the survey package

stvar

name of the stratification variable in the svydesign object used to fit the model

clvar

name of the cluster variable in the svydesign object used to fit the model

z

numerator of cutoff for measuring whether an observation has an extreme effect on its own predicted value; default is 3 but can be adjusted to control how many observations are flagged for inspection

Details

svydfbetas computes the values of dfbetas for each observation and parameter estimate, i.e., the amount that a parameter estimate changes when the unit is deleted from the sample. The model object must be created by svyglm in the R survey package. The output is a vector of the dfbeta and standardized dfbetas values. By default, svyglm uses only complete cases (i.e., ones for which the dependent variable and all independent variables are non-missing) to fit the model. The rows of the data frame used in fitting the model can be retrieved from the svyglm object via as.numeric(names(mobj$y)). The data for those rows is in mobj$data.

Value

List object with values:

Dfbeta

Numeric vector of unstandardized dfbeta values whose names are the rows of the data frame in the svydesign object that were used in fitting the model

Dfbetas

Numeric vector of standardized dfbetas values whose names are the rows of the data frame in the svydesign object that were used in fitting the model

cutoff

Value used for gauging whether a value of dffits is large. For a single-stage sample, cutoff=z/\sqrt{n}; for a 2-stage sample, cutoff=z/\sqrt{n[1+\rho (\bar{m}-1)]}

Author(s)

Richard Valliant

References

Li, J., and Valliant, R. (2011). Linear regression diagnostics for unclustered survey data. Journal of Official Statistics, 27, 99-119.

Li, J., and Valliant, R. (2015). Linear regression diagnostics in cluster samples. Journal of Official Statistics, 31, 61-75.

Lumley, T. (2010). Complex Surveys. New York: John Wiley & Sons.

Lumley, T. (2023). survey: analysis of complex survey samples. R package version 4.2.

See Also

svydffits, svyCooksD

Examples

require(survey)
data(api)
    # unstratified design single stage design
d0 <- svydesign(id=~1,strata=NULL, weights=~pw, data=apistrat)
m0 <- svyglm(api00 ~ ell + meals + mobility, design=d0)
svydfbetas(mobj=m0)

    # stratified cluster
require(NHANES)
data(NHANESraw)
dnhanes <- svydesign(id=~SDMVPSU, strata=~SDMVSTRA, weights=~WTINT2YR, nest=TRUE, data=NHANESraw)
m2 <- svyglm(BPDiaAve ~ as.factor(Race1) + BMI + AlcoholYear, design = dnhanes)
yy <- svydfbetas(mobj=m2, stvar= "SDMVSTRA", clvar="SDMVPSU")
apply(abs(yy$Dfbetas) > yy$cutoff,1, sum)

[Package svydiags version 0.6 Index]