mleCOP {copBasic} | R Documentation |
Maximum Pseudo-Log-Likelihood Estimation for Copula Parameter Estimation
Description
Perform maximum pseudo-log-likelihood estimation (pMLE) for copula parameters by maximizing the function:
\mathcal{L}(\Theta_p) = \sum_{i=1}^n \log\bigl[ c(F_x(x_i), F_y(y_i); \Theta_p)\bigr]\mbox{,}
where \mathcal{L}(\Theta_p)
is the log-likelihood for parameter vector \Theta_p
of dimension p
, and c(u,v; \Theta_p)
is the bivariate copula density. The u
and v
are estimated by the respective empirical cumulative distribution functions u = F_x(\cdots)
and v = F_y(\cdots)
for each of the joint realizations of a sample of size n
. The c(u,v)
is numerically estimated by the copula using the densityCOP
function.
Usage
mleCOP(u, v=NULL, cop=NULL, parafn=function(k) return(k),
interval=NULL, init.para=NULL, verbose=FALSE, control=list(),
the.zero=.Machine$double.eps^0.25, s=0, ...)
Arguments
u |
Nonexceedance probability |
v |
Nonexceedance probability |
cop |
A copula function; |
parafn |
A function responsible for generating the parameters. This is often just a simple return of a parameter vector as copBasic uses this style of parameterization, but this function can take over parameter remapping to handle boundary conditions to benefit the search or provide an interface into other copula packages in R (see Examples); |
interval |
The search interval for root finding, by |
init.para |
The initial guesses for the parameters for the |
verbose |
A logical that internally is converted to integer to trigger 1 (sum of logs of |
control |
This argument is the argument of the same name for |
the.zero |
The value for “the zero” of the copula density function. This argument is the argument of the same name for |
s |
A vector of at least two presumably uniformly distributed or regular sequence of nonexceedance probabilities in |
... |
Additional arguments to pass, see source code for the internally used functions that can pick these additional arguments up. |
Value
The value(s) for the estimated parameters are returned within an R list
where the elements listed below are populated unique to this package. The other elements of the returned list are generated from either the optimise()
(1D, p = 1
) or optim()
(pD, p \ge 2
) functions of R.
para |
The parameter(s) in a canonical element after the one-dimensional root finding ( |
packagetext |
A helpful message unique to the copBasic package; |
loglik |
The maximum of the log-likelihood matching the name for the same quantity by the function |
AIC |
Akaike information criterion (AIC) (see also |
BIC |
Bayesian information criterion (BIC) (see also |
Note
This section provides for a more thorough assessment of pMLE than shown in the Examples.
INTERFACE TO THE COPULA PACKAGE—A not uncommon question to the author is how can copBasic support copulas from other packages? A copBasic pMLE implementation to the Gaussian copula from the copula package is thus useful for instruction.
Two interface functions are required for the pMLE situation. First, interface the copula package in a generic form for the copBasic package:
"cB2copula" <- # pCoupla() from package copula is analogous to COP() function(u,v, para=NULL, ...) { if(length(u) == 1) u <- rep(u, length(v)) # see asCOP() for reasoning of if(length(v) == 1) v <- rep(v, length(u)) # this "vectorization" hack return(copula::pCopula(matrix(c(u,v), ncol=2), para)) }
where the para
argument above must be built by the features of the copula package. The following function then provides for parameter setup specific to the Gaussian copula having parameter \rho
:
copula2cBpara <- function(rho) return(copula::normalCopula(rho, dim = 2))
Now, let us perform a parameter estimate for a sample of size n=900
:
set.seed(162); UV <- simCOP(n=900, cop=cB2copula, para=copula2cBpara(0.45)) mleCOP(UV, cop=cB2copula, parafn=copula2cBpara, interval=c(-1,1))$para # rho.1 = 0.4248822
The search interval for the Gaussian copula is \rho \in [-1, 1]
, and the final result is \rho = 0.4458822
.
MULTI-DIMENSIONAL EXAMPLE OF pMLE—Consider a 2-parameter Gumbel–Hougaard copula (\mathbf{GH}(\Theta_1, \Theta_2)
) but now use the parafn
argument to provide boundary condition assistance through function GH2pfunc
to the optim()
function that performs the maximization.
set.seed(162); UV <- simCOP(n=890, cop=GHcop, para=c(2.4, .06)) GH2pfunc <- function(p) { return(c(exp(p[1])+1, exp(p[2]))) } ML <- mleCOP(UV$U, UV$V, cop=GHcop, init.para=c(1,1), parafn=GH2pfunc) print(ML$para) # [1] 2.2755018 0.1194788
and the result is \Theta_{1,2} = (2.2755018, 0.1194788)
. Next, consider now a 3-parameter \mathbf{GH}(\Theta, \pi_1, \pi_2)
copula and again use the parafn
argument through function GH3pfunc
but notice that the 2nd and 3rd parameters are now mapped into 0 \le \pi_1, \pi_2 \le 1
domain using the pnorm()
function.
set.seed(162); UV <- simCOP(n=500, cop=GHcop, para=c(5.5, .6, .9)) GH3pfunc <- function(p) { return(c(exp(p[1])+1, pnorm(p[2]), pnorm(p[3]))) } ML <- mleCOP(UV$U, UV$V, cop=GHcop, init.para=c(1, .5, .5), parafn=GH3pfunc) print(ML$para) # [1] 5.3742229 0.6141652 0.9382638
and the result is \Theta = 5.3742229
and \pi_{1,2} = (0.6141652, 0.9382638)
.
ANOTHER MULTI-DIMENSIONAL EXAMPLE OF pMLE—Finally, an experiment can be made fitting a 3-parameter \mathbf{GH}(\Theta, \pi_1, \pi_2)
to a simulation from a 2-parameter \mathbf{GH}(\beta_1, \beta_2)
, where the seed is just arbitrary and the Vuong Procedure (vuongCOP
) is used to compare fits and make inference. The parameter functions GH2pfunc
and GH3pfunc
are as before.
set.seed(10); UV <- simCOP(n=500, cop=GHcop, para=c(1.7, 1.86)) GH2pfunc <- function(p) { return(c(exp(p[1])+1, exp(p[2]) )) } GH3pfunc <- function(p) { return(c(exp(p[1])+1, pnorm(p[2]), pnorm(p[3]) )) } para1 <- mleCOP(UV, cop=GHcop, init.para=c(1,1), parafn=GH2pfunc)$para para2 <- mleCOP(UV, cop=GHcop, init.para=c(1,.5,.5), parafn=GH3pfunc)$para vuongCOP(UV, cop1=GHcop, para1=para1, cop2=GHcop, para2=para2)$message #[1] "Copula 1 has better fit than Copula 2 at 100 x (1-alpha) level"
The results show the 2-p \mathbf{GH}
is a better fit to the simulated data than the 3-p \mathbf{GH}
, which seems a bit self evident? Plot some same-seeded simulations just to confirm.
set.seed(67) # First the estimated parameters but with the correct model. UV <- simCOP(n=200, GHcop, para=para1, snv=TRUE, pch=16, col=2) set.seed(67) # Second, the estimated incorrect model. UV <- simCOP(n=200, GHcop, para=para2, snv=TRUE, ploton=FALSE)
Yes, differences in form are manifest in the produced graphic. Now, let us try another set of parameters and again an arbitrarily-chosen seed.
set.seed(10); UV <- simCOP(n=500, cop=GHcop, para=c(1.91, 0.16)) para1 <- mleCOP(UV, cop=GHcop, init.para=c(1,1), parafn=GH2pfunc)$para para2 <- mleCOP(UV, cop=GHcop, init.para=c(1,.5,.5), parafn=GH3pfunc)$para vuongCOP(UV, cop1=GHcop, para1=para1, cop2=GHcop, para2=para2)$message #[1] "Copulas 1 and 2 are not significantly different at 100 x (1-alpha)"
The results show equivalence, let us now check a graphic.
set.seed(67); z <- simCOP(n=200, GHcop, para=para1, snv=TRUE, pch=16, col=2) set.seed(67); z <- simCOP(n=200, GHcop, para=para2, snv=TRUE, ploton=FALSE)
The differences are small but the differences might be inflating into the lower left corner. What sample size could conceivably begin to distinguish between the copula?
kullCOP(cop1=GHcop, cop2=GHcop, para1=para1, para2=para2) # 625 on this run nsim <- 20; set.seed(67) Results <- sapply(1:nsim, function(i) { UV <- simCOP(n=625, cop=GHcop, para=c(1.91, .16), graphics=FALSE) p1 <- mleCOP(UV, cop=GHcop, init.para=c(1,1), parafn=GH2pfunc)$para p2 <- mleCOP(UV, cop=GHcop, init.para=c(1,.5,.5), parafn=GH3pfunc)$para vuongCOP(UV, cop1=GHcop, para1=p1, cop2=GHcop, para2=p2)$result }) sum(Results)
The summation yields 6 of 20 for which copula 1 has the better fit, but with n=1{,}000
instead of n=625
, the sum of the Results
is 13 of 20 (so better than half the time). This seems to be in conflict with what the n_{fg}
sample size from kullCOP
should be telling. The author thinks it should be 18 to 19 of 20 (95th percentile) based on what the kullCOP
is reported to do (NEED TO LOOK INTO THIS).
Author(s)
W.H. Asquith
References
Joe, H., 2014, Dependence modeling with copulas: Boca Raton, CRC Press, 462 p.
See Also
Examples
# See also extended code listings and discussion in the Note section
## Not run:
# Here, we study the trajectory of the objective function in a simple
# 1-dimensional optimization. See how we must provide the interval.
set.seed(162); UV <- simCOP(n=188, cop=PLcop, para=5.6)
ML <- mleCOP(UV$U, UV$V, cop=PLcop, interval=c(0.1, 40)) # 5.225459 estimated
Thetas <- 10^(seq(log10(0.001), log10(100), by=0.005))
MLs <- sapply(Thetas, function(k)
densityCOP(UV$U, UV$V, cop=PLcop, para=k, sumlogs=TRUE))
plot(Thetas, MLs, log="x", type="l", # draw the pMLE solution process
xlab="Plackett Theta", ylab="sum of log densities")
lines(rep(ML$para, 2), c(ML$objective, par()$usr[3]), col="red")
points(ML$para, ML$objective, pch=16, col="red") #
## End(Not run)
## Not run:
# Here, we study again 1-dimensional optimization but use the
# multidimensional version with an alert issued.
set.seed(149); UV <- simCOP(1000, cop=CLcop, para=pi)
# Warning messages about using optim() for 1D solution
mleCOP(UV, cop=CLcop, init.para=2)$para # 3.082031
# No warning message, optimise() called instead.
mleCOP(UV, cop=CLcop, interval=c(0,1E2))$para # 3.081699
## End(Not run)
## Not run:
# Here, we evaluate a 2-dimensional problem using a Plackett again but with
# the addition of asymmetry towards high V outliers from the Plackett cloud.
# This example also adds the internal verbose and graphic diagnostics for
# the iterations of the optimizer. Here, we learn that we need on a time have
# some idea where the solution might lay so that we can provide a suitable
# set of initial parameters for the algorithm.
para <- list(beta=-0.1, cop=PLcop, para1=1000)
UV <- simCOP(2000, cop=breveCOP, para=para); abline(0, 1, col="red", lwd=3)
PL2pfunc <- function(p) { # see here example of parameter transform
list(beta=2*pnorm(p[1])-1, para=exp(p[2]), cop=PLcop) # [-1,+1], >=0
}
init.para <- c(0.2535, log(0.02)) # These will not find a solution with this
# guess of negative association, but the next works by using an external
# estimate of the Plackett parameters and here we test with a positive
# skewness (beta for breveCOP > 0) although we know the parent had negative.
init.para <- c(0.2535, log(PLACKETTpar(UV$U, UV$V, byrho=TRUE))) # beta=0.200
rt <- mleCOP(u=UV$U, v=UV$V, init.para=init.para, cop=breveCOP,
parafn=PL2pfunc, verbose=2, s=seq(0,1, by=0.005)) #
## End(Not run)