R: dffits for models fitted with complex survey data

svydffits {svydiags}

R Documentation

dffits for models fitted with complex survey data

Description

Compute the dffits measure of the effect of extreme observations on predicted values for fixed effects, linear regression models fitted with data collected from one- and two-stage complex survey designs.

Usage

svydffits(mobj, stvar=NULL, clvar=NULL, z=3)

Arguments

`mobj`	model object produced by `svyglm` in the `survey` package
`stvar`	name of the stratification variable in the `svydesign` object used to fit the model
`clvar`	name of the cluster variable in the `svydesign` object used to fit the model
`z`	numerator of cutoff for measuring whether an observation has an extreme effect on its own predicted value; default is 3 but can be adjusted to control how many observations are flagged for inspection

Details

svydffits computes the value of dffits for each observation, i.e., the amount that a unit's predicted value changes when the unit is deleted from the sample. The model object must be created by svyglm in the R survey package. The output is a vector of the dffit and standardized dffits values. By default, svyglm uses only complete cases (i.e., ones for which the dependent variable and all independent variables are non-missing) to fit the model. The rows of the data frame used in fitting the model can be retrieved from the svyglm object via as.numeric(names(mobj$y)). The data for those rows is in mobj$data.

Value

List object with values:

`Dffit`	Numeric vector of unstandardized dffit values whose names are the rows of the data frame in the `svydesign` object that were used in fitting the model
`Dffits`	Numeric vector of standardized dffits values whose names are the rows of the data frame in the `svydesign` object that were used in fitting the model
`cutoff`	Value used for gauging whether a value of dffits is large. For a single-stage sample, `cutoff`=`z/\sqrt{n}`; for a 2-stage sample, `cutoff`=`z\sqrt{p/n\bar{m}[1+\rho (\bar{m}-1)]}`

Author(s)

Richard Valliant

References

Li, J., and Valliant, R. (2011). Linear regression diagnostics for unclustered survey data. Journal of Official Statistics, 27, 99-119.

Li, J., and Valliant, R. (2015). Linear regression diagnostics in cluster samples. Journal of Official Statistics, 31, 61-75.

Lumley, T. (2010). Complex Surveys. New York: John Wiley & Sons.

Lumley, T. (2023). survey: analysis of complex survey samples. R package version 4.2.

Examples

require(survey)
data(api)
    # unstratified design single stage design
d0 <- svydesign(id=~1,strata=NULL, weights=~pw, data=apistrat)
m0 <- svyglm(api00 ~ ell + meals + mobility, design=d0)
yy <- svydffits(mobj=m0)
yy$cutoff
sum(abs(yy$Dffits) > yy$cutoff)

require(NHANES)
data(NHANESraw)
dnhanes <- svydesign(id=~SDMVPSU, strata=~SDMVSTRA, weights=~WTINT2YR, nest=TRUE, data=NHANESraw)
m2 <- svyglm(BPDiaAve ~ as.factor(Race1) + BMI + AlcoholYear, design = dnhanes)
yy <- svydffits(mobj=m2, stvar= "SDMVSTRA", clvar="SDMVPSU", z=4)
sum(abs(yy$Dffits) > yy$cutoff)

[Package svydiags version 0.6 Index]