svymle {survey} R Documentation

## Maximum pseudolikelihood estimation in complex surveys

### Description

Maximises a user-specified likelihood parametrised by multiple linear predictors to data from a complex sample survey and computes the sandwich variance estimator of the coefficients. Note that this function maximises an estimated population likelihood, it is not the sample MLE.

### Usage

svymle(loglike, gradient = NULL, design, formulas, start = NULL, control
= list(), na.action="na.fail", method=NULL, lower=NULL,upper=NULL,influence=FALSE,...)
## S3 method for class 'svymle'
summary(object, stderr=c("robust", "model"),...)


### Arguments

 loglike vectorised loglikelihood function gradient Derivative of loglike. Required for variance computation and helpful for fitting design a survey.design object formulas A list of formulas specifying the variable and linear predictors: see Details below start Starting values for parameters control control options for the optimiser: see the help page for the optimiser you are using. lower,upper Parameter bounds for bobyqa influence Return the influence functions (primarily for svyby) na.action Handling of NAs method "nlm" to use nlm,"uobyqa" or "bobyqa" to use those optimisers from the minqa package; otherwise passed to optim ... Arguments to loglike and gradient that are not to be optimised over. object svymle object stderr Choice of standard error estimator. The default is a standard sandwich estimator. See Details below.

### Details

Optimization is done by nlm by default or if method=="nlm". Otherwise optim is used and method specifies the method and control specifies control parameters.

The design object contains all the data and design information from the survey, so all the formulas refer to variables in this object. The formulas argument needs to specify the response variable and a linear predictor for each freely varying argument of loglike.

Consider for example the dnorm function, with arguments x, mean, sd and log, and suppose we want to estimate the mean of y as a linear function of a variable z, and to estimate a constant standard deviation. The log argument must be fixed at FALSE to get the loglikelihood. A formulas argument would be list(~y, mean=~z, sd=~1). Note that the data variable y must be the first argument to dnorm and the first formula and that all the other formulas are labelled. It is also permitted to have the data variable as the left-hand side of one of the formulas: eg list( mean=y~z, sd=~1).

The two optimisers from the minqa package do not use any derivatives to be specified for optimisation, but they do assume that the function is smooth enough for a quadratic approximation, ie, that two derivatives exist.

The usual variance estimator for MLEs in a survey sample is a ‘sandwich’ variance that requires the score vector and the information matrix. It requires only sampling assumptions to be valid (though some model assumptions are required for it to be useful). This is the stderr="robust" option, which is available only when the gradient argument was specified.

If the model is correctly specified and the sampling is at random conditional on variables in the model then standard errors based on just the information matrix will be approximately valid. In particular, for independent sampling where weights and strata depend on variables in the model the stderr="model" should work fairly well.

### Value

An object of class svymle

### Author(s)

Thomas Lumley

svydesign, svyglm

### Examples


data(api)

dstrat<-svydesign(id=~1, strata=~stype, weight=~pw, fpc=~fpc, data=apistrat)

## fit with glm
m0 <- svyglm(api00~api99+ell,family="gaussian",design=dstrat)
## fit as mle (without gradient)
m1 <- svymle(loglike=dnorm,gradient=NULL, design=dstrat,
formulas=list(mean=api00~api99+ell, sd=~1),
start=list(c(80,1,0),c(20)), log=TRUE)
gr<- function(x,mean,sd,log){
dm<-2*(x - mean)/(2*sd^2)
ds<-(x-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi))
cbind(dm,ds)
}
m2 <- svymle(loglike=dnorm,gradient=gr, design=dstrat,
formulas=list(mean=api00~api99+ell, sd=~1),
start=list(c(80,1,0),c(20)), log=TRUE, method="BFGS")

summary(m0)
summary(m1,stderr="model")
summary(m2)

## Using offsets
m3 <- svymle(loglike=dnorm,gradient=gr, design=dstrat,
formulas=list(mean=api00~api99+offset(ell)+ell, sd=~1),
start=list(c(80,1,0),c(20)), log=TRUE, method="BFGS")

## demonstrating multiple linear predictors

m3 <- svymle(loglike=dnorm,gradient=gr, design=dstrat,
formulas=list(mean=api00~api99+offset(ell)+ell, sd=~stype),
start=list(c(80,1,0),c(20,0,0)), log=TRUE, method="BFGS")

## More complicated censored lognormal data example
## showing that the response variable can be multivariate

data(pbc, package="survival")
pbc$randomized <- with(pbc, !is.na(trt) & trt>0) biasmodel<-glm(randomized~age*edema,data=pbc) pbc$randprob<-fitted(biasmodel)
dpbc<-svydesign(id=~1, prob=~randprob, strata=~edema,
data=subset(pbc,randomized))

## censored logNormal likelihood
lcens<-function(x,mean,sd){
ifelse(x[,2]==1,
dnorm(log(x[,1]),mean,sd,log=TRUE),
pnorm(log(x[,1]),mean,sd,log=TRUE,lower.tail=FALSE)
)
}

gcens<- function(x,mean,sd){

dz<- -dnorm(log(x[,1]),mean,sd)/pnorm(log(x[,1]),mean,sd,lower.tail=FALSE)

dm<-ifelse(x[,2]==1,
2*(log(x[,1]) - mean)/(2*sd^2),
dz*-1/sd)
ds<-ifelse(x[,2]==1,
(log(x[,1])-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi)),
ds<- dz*-(log(x[,1])-mean)/(sd*sd))
cbind(dm,ds)
}

m<-svymle(loglike=lcens, gradient=gcens, design=dpbc, method="newuoa",
formulas=list(mean=I(cbind(time,status>0))~bili+protime+albumin,
sd=~1),
start=list(c(10,0,0,0),c(1)))

summary(m)

## the same model, but now specifying the lower bound of zero on the
## log standard deviation

mbox<-svymle(loglike=lcens, gradient=gcens, design=dpbc, method="bobyqa",
formulas=list(mean=I(cbind(time,status>0))~bili+protime+albumin, sd=~1),
lower=list(c(-Inf,-Inf,-Inf,-Inf),0), upper=Inf,
start=list(c(10,0,0,0),c(1)))

## The censored lognormal model is now available in svysurvreg()

summary(svysurvreg(Surv(time,status>0)~bili+protime+albumin,
design=dpbc,dist="lognormal"))

## compare svymle scale value after log transformation
svycontrast(m, quote(log(sd.(Intercept))))



[Package survey version 4.1-1 Index]