mleDb {dbd}R Documentation

Maximum likelihood estimates of db parameters.

Description

Calculates maximum likelihood estimates of the alpha and beta parameters of a db distribution. Calls upon optim() with the "BFGS" method.

Usage

mleDb(x, ntop, zeta=FALSE, par0=NULL, UB=10, maxit=1000,
      covmat=TRUE, useGinv=FALSE)

Arguments

x

A random sample from the db distribution whose parameters are being estimated. Missing values are allowed.

ntop

The ntop parameter of the db distribution whose parameters are being estimated. I.e. it is the maximum possible value of the distribution, whose values are integers between 1 and ntop, or between 0 and ntop if zeta (see below) is TRUE.

zeta

See ddb().

par0

Optional starting values for the iterative estimation procedure. A vector with entries alpha and beta. Ideally this vector should be named; if not it is assumed that the entries are in the order alpha, beta. If not supplied starting values are calculated using meDb().

UB

Positive numeric scalar, providing an upper bound on the starting values used by mleDb(). It appears that if these starting values are too large (it is not clear how large) then optim() will throw an error. This bound is ignored if par0 is supplied.

maxit

Integer scalar. The maximum number of iterations to be undertaken by optim(). What happens if this number is exceeded depends on the value of options()[["maxitErrorOrWarning"]]. This may be "error" (in which case an error is thrown if maxit is exceeded) or "warning" (in which case a warning is issued). The values is set equal to "error" at startup. It may be switched, from on possibility to the other, by means of the function set.eow().

covmat

Logical scalar. Should the covariance matrix of the parameter estimates be calculated? In simulation studies, in which the covariance matrix is not of interest, calculations might be speeded up a bit by setting covmat=FALSE.

useGinv

Logical scalar. Should the ginv() (generalised inverse) function from the MASS package be used to calculate a surrogate covariance matrix if the hessian is numerically singular? This is probably not advisable; the possibility of using the generalised inverse is provided for the sake of completeness. Caveat utilitor. This argument is ignored if covmat is FALSE.

Details

The ntop and zeta parameters must be supplied; they are not formally estimated (although the choice of ntop may be influenced by the data — see below). The parameter zeta has a default value, FALSE.

If the generating mechanism from which the observed data x arose has a (known) theoretical least upper bound, then ntop should probably be set equal to this upper bound. If the data are theoretically unbounded, then ntop should probably be set equal to 1 + max(x). In this case \Pr(X = \textrm{ntop}) should probably be interpreted as \Pr(X \geq \textrm{ntop}). Otherwise ntop should should probably be set equal to max(x). The choice depends on circumstances and is up to the user.

Missing values are removed from x before it is passed to optim(). (Note that ddb() doesn't mind missing values but returns missing values when evaluated at them. This in turn produces a missing value for the log likelihood.)

In previous versions of this package (0.1-17 and earlier) optim() was called with method "L-BFGS-B". The change was made possible by the fact that, with the new “direct” version of ddb(), it is no longer necessary to bound the parameters away from (above) zero.

Value

An object of class "mleDb". Such an object consists of a named vector with entries "alpha" and "beta", which are the estimates of the corresponding parameters. It has a number of attributes:

Author(s)

Rolf Turner r.turner@auckland.ac.nz

See Also

ddb meDb() optim() aHess() vcov.mleDb()

Examples

set.seed(42)
x <- rdb(500,3,5,2)
ests <- mleDb(x,2)  # Bad!  Mind you, 2 is a "bad" value for ntop!
                    # Hessian is singular; covMat is NA.
# Get much better results using true parameter values
# as starting values; pity we can't do this in real life!
ests <- mleDb(x,2,par0=c(alpha=3,beta=5))
x <- rdb(500,3,5,20)
ests <- mleDb(x,20) # Pretty good.
print(vcov(ests))

# Binomial, n = 10, p = 0.3.
set.seed(42)
x   <- rbinom(1000,10,0.3)
fit <- mleDb(x,10,zeta=TRUE)
print(vcov(fit))
p1  <- dbinom(0:10,10,0.3)
p2  <- dbinom(0:10,10,mean(x)/10)
p3  <- table(factor(x,levels=0:10))/1000
plot(fit,obsd=x,legPos=NULL,ylim=c(0,max(p1,p2,p3,
     ddb(0:10,fit[1],fit[2],10,zeta=TRUE))))
lines(0.2+(0:10),p1,col="orange",type="h",ylim=c(0,max(p1,p2)))
lines(0.3+(0:10),p2,col="green",type="h")
legend("topright",lty=1,col=c("red","blue","orange","green"),
       legend=c("db","observed","true binomial","fitted binomial"),bty="n")
print(attr(fit,"log.like")) # -1778.36
print(sum(dbinom(x,10,mean(x)/10,log=TRUE))) # -1777.36
# Slightly better fit with only one estimated parameter,
# but then binomial is the true distribution, so you'd
# kind of expect a better fit.
print(sum(dbinom(x,10,0.3,log=TRUE))) # -1778.37

# Poisson mean = 5
set.seed(42)
x    <- rpois(1000,5)
fit  <- mleDb(x,14,zeta=TRUE) # max(x) = 13, take ntop = 1+13
print(vcov(fit))
p1   <- c(dpois(0:13,5),1-ppois(13,5))
lhat <- mean(x)
p2   <- c(dpois(0:13,lhat),1-ppois(13,lhat))
plot(fit,obsd=x,legPos=NULL,ylim=c(0,max(p1,p2,p3,
     ddb(0:14,fit[1],fit[2],14,zeta=TRUE))))
lines(0.2+0:14,p1,col="orange",type="h")
lines(0.3+(0:14),p2,col="green",type="h")
legend("topright",lty=1,col=c("red","blue","orange","green"),
       legend=c("db","observed","true Poisson","fitted Poisson"),bty="n")
print(attr(fit,"log.like")) # -2198.594
print(sum(dpois(x,lhat,log=TRUE))) # -2197.345
# Slightly better fit with only one estimated parameter,
# but then Poisson is the true distribution, so you'd
# kind of expect a better fit.
print(sum(dpois(x,5,log=TRUE))) # -2198.089

[Package dbd version 0.0-22 Index]