R: Maximum pseudolikelihood estimation in complex surveys

svymle {survey}

R Documentation

Maximum pseudolikelihood estimation in complex surveys

Description

Maximises a user-specified likelihood parametrised by multiple linear predictors to data from a complex sample survey and computes the sandwich variance estimator of the coefficients. Note that this function maximises an estimated population likelihood, it is not the sample MLE.

Usage

svymle(loglike, gradient = NULL, design, formulas, start = NULL, control
= list(), na.action="na.fail", method=NULL, lower=NULL,upper=NULL,influence=FALSE,...)
## S3 method for class 'svymle'
summary(object, stderr=c("robust", "model"),...)

Arguments

`loglike`	vectorised loglikelihood function
`gradient`	Derivative of `loglike`. Required for variance computation and helpful for fitting
`design`	a `survey.design` object
`formulas`	A list of formulas specifying the variable and linear predictors: see Details below
`start`	Starting values for parameters
`control`	control options for the optimiser: see the help page for the optimiser you are using.
`lower`, `upper`	Parameter bounds for `bobyqa`
`influence`	Return the influence functions (primarily for svyby)
`na.action`	Handling of `NA`s
`method`	`"nlm"` to use `nlm`,`"uobyqa"` or `"bobyqa"` to use those optimisers from the `minqa` package; otherwise passed to `optim`
`...`	Arguments to `loglike` and `gradient` that are not to be optimised over.
`object`	`svymle` object
`stderr`	Choice of standard error estimator. The default is a standard sandwich estimator. See Details below.

Details

Optimization is done by nlm by default or if method=="nlm". Otherwise optim is used and method specifies the method and control specifies control parameters.

The design object contains all the data and design information from the survey, so all the formulas refer to variables in this object. The formulas argument needs to specify the response variable and a linear predictor for each freely varying argument of loglike.

Consider for example the dnorm function, with arguments x, mean, sd and log, and suppose we want to estimate the mean of y as a linear function of a variable z, and to estimate a constant standard deviation. The log argument must be fixed at FALSE to get the loglikelihood. A formulas argument would be list(~y, mean=~z, sd=~1). Note that the data variable y must be the first argument to dnorm and the first formula and that all the other formulas are labelled. It is also permitted to have the data variable as the left-hand side of one of the formulas: eg list( mean=y~z, sd=~1).

The two optimisers from the minqa package do not use any derivatives to be specified for optimisation, but they do assume that the function is smooth enough for a quadratic approximation, ie, that two derivatives exist.

The usual variance estimator for MLEs in a survey sample is a ‘sandwich’ variance that requires the score vector and the information matrix. It requires only sampling assumptions to be valid (though some model assumptions are required for it to be useful). This is the stderr="robust" option, which is available only when the gradient argument was specified.

If the model is correctly specified and the sampling is at random conditional on variables in the model then standard errors based on just the information matrix will be approximately valid. In particular, for independent sampling where weights and strata depend on variables in the model the stderr="model" should work fairly well.

Value

An object of class svymle

Author(s)

Thomas Lumley

Examples


 data(api)

 dstrat<-svydesign(id=~1, strata=~stype, weight=~pw, fpc=~fpc, data=apistrat)

 ## fit with glm
 m0 <- svyglm(api00~api99+ell,family="gaussian",design=dstrat)
 ## fit as mle (without gradient)
 m1 <- svymle(loglike=dnorm,gradient=NULL, design=dstrat, 
    formulas=list(mean=api00~api99+ell, sd=~1),
    start=list(c(80,1,0),c(20)), log=TRUE)
 ## with gradient
 gr<- function(x,mean,sd,log){
	 dm<-2*(x - mean)/(2*sd^2)
	 ds<-(x-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi))
         cbind(dm,ds)
      }
 m2 <- svymle(loglike=dnorm,gradient=gr, design=dstrat, 
    formulas=list(mean=api00~api99+ell, sd=~1),
    start=list(c(80,1,0),c(20)), log=TRUE, method="BFGS")

 summary(m0)
 summary(m1,stderr="model")
 summary(m2)

 ## Using offsets
 m3 <- svymle(loglike=dnorm,gradient=gr, design=dstrat, 
    formulas=list(mean=api00~api99+offset(ell)+ell, sd=~1),
    start=list(c(80,1,0),c(20)), log=TRUE, method="BFGS")


## demonstrating multiple linear predictors

 m3 <- svymle(loglike=dnorm,gradient=gr, design=dstrat, 
    formulas=list(mean=api00~api99+offset(ell)+ell, sd=~stype),
    start=list(c(80,1,0),c(20,0,0)), log=TRUE, method="BFGS")



 ## More complicated censored lognormal data example
 ## showing that the response variable can be multivariate

 data(pbc, package="survival")
 pbc$randomized <- with(pbc, !is.na(trt) & trt>0)
 biasmodel<-glm(randomized~age*edema,data=pbc)
 pbc$randprob<-fitted(biasmodel)
 dpbc<-svydesign(id=~1, prob=~randprob, strata=~edema,
    data=subset(pbc,randomized))


## censored logNormal likelihood
 lcens<-function(x,mean,sd){
    ifelse(x[,2]==1,
           dnorm(log(x[,1]),mean,sd,log=TRUE),
           pnorm(log(x[,1]),mean,sd,log=TRUE,lower.tail=FALSE)
           )
 }

 gcens<- function(x,mean,sd){

        dz<- -dnorm(log(x[,1]),mean,sd)/pnorm(log(x[,1]),mean,sd,lower.tail=FALSE)

        dm<-ifelse(x[,2]==1,
                   2*(log(x[,1]) - mean)/(2*sd^2),
                   dz*-1/sd)
        ds<-ifelse(x[,2]==1,
                   (log(x[,1])-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi)),
                   ds<- dz*-(log(x[,1])-mean)/(sd*sd))
        cbind(dm,ds)      
 }

m<-svymle(loglike=lcens, gradient=gcens, design=dpbc, method="newuoa",
      formulas=list(mean=I(cbind(time,status>0))~bili+protime+albumin,
                    sd=~1),
         start=list(c(10,0,0,0),c(1)))

summary(m)

## the same model, but now specifying the lower bound of zero on the
## log standard deviation

mbox<-svymle(loglike=lcens, gradient=gcens, design=dpbc, method="bobyqa",
      formulas=list(mean=I(cbind(time,status>0))~bili+protime+albumin, sd=~1),
      lower=list(c(-Inf,-Inf,-Inf,-Inf),0), upper=Inf,
      start=list(c(10,0,0,0),c(1)))


## The censored lognormal model is now available in svysurvreg()

summary(svysurvreg(Surv(time,status>0)~bili+protime+albumin,
        design=dpbc,dist="lognormal"))

## compare svymle scale value after log transformation
svycontrast(m, quote(log(`sd.(Intercept)`)))

[Package survey version 4.4-2 Index]