R: Fitting Copulas to Data - Copula Parameter Estimation

fitCopula {copula}

R Documentation

Fitting Copulas to Data – Copula Parameter Estimation

Description

Parameter estimation of copulas, i.e., fitting of a copula model to multivariate (possibly “pseudo”) observations.

Usage

loglikCopula(param = getTheta(copula), u, copula,
             error = c("-Inf", "warn-Inf", "let-it-be"))

loglikCopulaMany(pList, u, copula)

## Generic [and "rotCopula" method] : %- ../R/fitCopula.R
fitCopula(copula, data, ...)
## S4 method for signature 'parCopula'
fitCopula(copula, data,
          method = c("mpl", "ml", "itau", "irho", "itau.mpl"),
          posDef = is(copula, "ellipCopula"),
          start = NULL, lower = NULL, upper = NULL,
          optim.method = optimMeth(copula, method, dim = d),
          optim.control = list(maxit=1000),
          estimate.variance = NA, hideWarnings = FALSE, ...)

optimMeth(copula, method, dim)

Arguments

`param`	vector of free (see `isFree()` and `getTheta()`) parameter values.
`pList`	a `list` of free parameter vectors (as `param` above). In the 1D case, `length(param) == 1`, may also be a numeric vector.
`u`	`n\times d`-matrix of (pseudo-)observations in `[0,1]^d` for computing the copula log-likelihood, where `n` denotes the sample size and `d` the dimension. Consider applying the function `pobs()` first in order to obtain such data.
`data`	as `u`, an `n\times d`-matrix of data. For `method` being `"mpl"`, `"ml"` or `"itau.mpl"`, this has to be data in `[0,1]^d`. For `method` being `"itau"` or `"irho"`, it can either be data in `[0,1]^d` or in the whole `d`-dimensional space.
`copula`	a `"copula"` object.
`error`	(for `loglikCopula()`:) a `character` string specifying how errors in the underlying `dCopula()` calls should be handled: `"-Inf"`: the value of the log likelihood should silently be set to `-Inf`. `"warn-Inf"`: signal a `warning` about the error and set the value to `-Inf`. `"let-it-be"`: the error is signalled and hence the likelihood computation fails.
`method`	a `character` string specifying the copula parameter estimator used. This can be one of: "mpl" Maximum pseudo-likelihood estimator (based on “pseudo-observations” in `[0,1]^d`, typical obtained via `pobs()`). "ml" As `"mpl"` just with a different variance estimator. For this to be correct (thus giving the true MLE), `data` are assumed to be observations from the true underlying copula whose parameter is to be estimated. "itau" Inversion of Kendall's tau estimator. `data` can be either in `[0,1]^d` (true or pseudo-observations of the underlying copula to be estimated) or in the `d`-dimensional space. "irho" As `"itau"` just with Spearman's rho instead of Kendall's tau. "itau.mpl" This is the estimator of `t` copula parameters suggested by Mashal and Zeevi (2002) based on the idea of inverting Kendall's tau for estimating the correlation matrix as introduced in a RiskLab report in 2001 later published as Embrechts et al. (2003); see also Demarta and McNeil (2005). The given `data` has to be in `[0,1]^d` (either true or pseudo-observations of the underlying copula to be estimated). Note that this method requires `dispstr = "un"`.
`posDef`	a `logical` indicating whether a proper correlation matrix is computed.
`start`	a `vector` of starting values for the parameter optimization via `optim()`.
`lower`, `upper`	Lower or upper parameter bounds for the optimization methods `"Brent"` or `"L-BFGS-B"`.
`optim.control`	a `list` of control parameters passed to `optim(*, control=optim.control)`.
`optim.method`	a character string specify the optimization method or a `function` which when called with arguments `(copula, method, dim)` will return such a character string, see `optim()`'s `method`; only used when `method = "mpl"` or `"ml"`. The default has been changed (for copula 0.999-16, in Aug. 2016) from `"BFGS"` to the result of `optimMeth(copula, method, dim)` which is often `"L-BFGS-B"`.
`dim`	integer, the data and copula dimension, `d \ge 2`.
`estimate.variance`	a `logical` indicating whether the estimator's asymptotic variance is computed (if available for the given `copula`; the default `NA` computes it for the `method`s `"itau"` and `"irho"`, cannot (yet) compute it for `"itau.mpl"` and only computes it for `"mpl"` or `"ml"` if the optimization converged).
`hideWarnings`	a `logical`, which, if `TRUE`, suppresses warnings from the involved likelihood maximization (typically when the likelihood is evaluated at invalid parameter values).
`...`	additional arguments passed to `method` specific auxiliary functions, e.g., `traceOpt = TRUE` (or `traceOpt = 10`) for tracing `optimize` (every 10-th function evaluation) for method `"itau.mpl"`, and for “manual” tracing with method `"ml"` or `"mpl"` also showing parameter values (notably for `optim.method="Brent"`), see the extra arguments of namespace-hidden function `fitCopula.ml()`.

Details

The only difference between "mpl" and "ml" is in the variance-covariance estimate, not in the parameter (\theta) estimates.

If method "mpl" in fitCopula() is used and if start is not assigned a value, estimates obtained from method "itau" are used as initial values in the optimization. Standard errors are computed as explained in Genest, Ghoudi and Rivest (1995); see also Kojadinovic and Yan (2010, Section 3). Their estimation requires the computation of certain partial derivatives of the (log) density. These have been implemented for six copula families thus far: the Clayton, Gumbel-Hougaard, Frank, Plackett, normal and t copula families. For other families, numerical differentiation based on grad() from package numDeriv is used (and a warning message is displayed).

In the multiparameter elliptical case and when the estimation is based on Kendall's tau or Spearman's rho, the estimated correlation matrix may not always be positive-definite. In that case, nearPD(*, corr=TRUE) (from Matrix) is applied to get a proper correlation matrix.

For normal and t copulas, fitCopula(, method = "mpl") and fitCopula(, method = "ml") maximize the log-likelihood based on mvtnorm's dmvnorm() and dmvt(), respectively. The latter two functions set the respective densities to zero if the correlation matrices of the corresponding distributions are not positive definite. As such, the estimated correlation matrices will be positive definite.

If methods "itau" or "irho" are used in fitCopula(), an estimate of the asymptotic variance (if available for the copula under consideration) will be correctly computed only if the argument data consists of pseudo-observations (see pobs()).

Consider the t copula with df.fixed=FALSE (see ellipCopula()). In this case, the methods "itau" and "irho" cannot be used in fitCopula() as they cannot estimate the degrees of freedom parameter df. For the methods "mpl" and "itau.mpl" the asymptotic variance cannot be (fully) estimated (yet). For the methods "ml" and "mpl", when start is not specified, the starting value for df is set to copula@df, typically 4.

To implement the Inference Functions for Margins (IFM) method (see, e.g., Joe 2005), set method="ml" and note that data need to be parametric pseudo-observations obtained from fitted parametric marginal distribution functions. The returned large-sample variance will then underestimate the true variance (as the procedure cannot take into account the (unknown) estimation error for the margins).

The fitting procedures based on optim() generate warnings because invalid parameter values are tried during the optimization process. When the number of parameters is one and the parameter space is bounded, using optim.method="Brent" is likely to give less warnings. Furthermore, from experience, optim.method="Nelder-Mead" is sometimes a more robust alternative to optim.method="BFGS" or "L-BFGS-B".

There are methods for vcov(), coef(), logLik(), and nobs().

Value

loglikCopula() returns the copula log-likelihood evaluated at the parameter (vector) param given the data u.

loglikCopulaMany() returns a numeric vector of such log-likelihoods; it assumes consistent parameter values, corresponding to loglikCopula()'s error = "let-it-be", for speed.

The return value of fitCopula() is an object of class "fitCopula" (inheriting from hidden class "fittedMV"), containing (among others!) the slots

estimate: The parameter estimates.
var.est: The large-sample (i.e., asymptotic) variance estimate of the parameter estimator unless estimate.variance=FALSE where it is matrix(numeric(), 0,0) (to be distinguishable from cases when the covariance estimates failed partially).
copula: The fitted copula object.

The summary() method for "fitCopula" objects returns an S3 “class” "summary.fitCopula", which is simply a list with components method, loglik and convergence, all three from the corresponding slots of the "fitCopula" objects, and coefficients (a matrix of estimated coefficients, standard errors, t values and p-values).

References

Genest, C. (1987). Frank's family of bivariate distributions. Biometrika 74, 549–555.

Genest, C. and Rivest, L.-P. (1993). Statistical inference procedures for bivariate Archimedean copulas. Journal of the American Statistical Association 88, 1034–1043.

Rousseeuw, P. and Molenberghs, G. (1993). Transformation of nonpositive semidefinite correlation matrices. Communications in Statistics: Theory and Methods 22, 965–984.

Genest, C., Ghoudi, K., and Rivest, L.-P. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 82, 543–552.

Joe, H. (2005). Asymptotic efficiency of the two-stage estimation method for copula-based models. Journal of Multivariate Analysis 94, 401–419.

Mashal, R. and Zeevi, A. (2002). Beyond Correlation: Extreme Co-movements Between Financial Assets. https://www0.gsb.columbia.edu/faculty/azeevi/PAPERS/BeyondCorrelation.pdf (2016-04-05)

Demarta, S. and McNeil, A. J. (2005). The t copula and related copulas. International Statistical Review 73, 111–129.

Genest, C. and Favre, A.-C. (2007). Everything you always wanted to know about copula modeling but were afraid to ask. Journal of Hydrologic Engineering 12, 347–368.

Kojadinovic, I. and Yan, J. (2010). Comparison of three semiparametric methods for estimating dependence parameters in copula models. Insurance: Mathematics and Economics 47, 52–63.

Examples

(Xtras <- copula:::doExtras()) # determine whether examples will be extra (long)
n <- if(Xtras) 200 else 64 # sample size

## A Gumbel copula
set.seed(7) # for reproducibility
gumbel.cop <- gumbelCopula(3, dim=2)
x <- rCopula(n, gumbel.cop) # "true" observations (simulated)
u <- pobs(x)                # pseudo-observations
## Inverting Kendall's tau
fit.tau <- fitCopula(gumbelCopula(), u, method="itau")
fit.tau
confint(fit.tau) # work fine !
confint(fit.tau, level = 0.98)
summary(fit.tau) # a bit more, notably "Std. Error"s
coef(fit.tau)# named vector
coef(fit.tau, SE = TRUE)# matrix

## Inverting Spearman's rho
fit.rho <- fitCopula(gumbelCopula(), u, method="irho")
summary(fit.rho)
## Maximum pseudo-likelihood
fit.mpl <- fitCopula(gumbelCopula(), u, method="mpl")
fit.mpl
## Maximum likelihood -- use 'x', not 'u' ! --
fit.ml <- fitCopula(gumbelCopula(), x, method="ml")
summary(fit.ml) # now prints a bit more than simple 'fit.ml'
## ... and what's the log likelihood (in two different ways):
(ll. <- logLik(fit.ml))
stopifnot(all.equal(as.numeric(ll.),
            loglikCopula(coef(fit.ml), u=x, copula=gumbel.cop)))

## A Gauss/normal copula

## With multiple/*un*constrained parameters
set.seed(6) # for reproducibility
normal.cop <- normalCopula(c(0.6, 0.36, 0.6), dim=3, dispstr="un")
x <- rCopula(n, normal.cop) # "true" observations (simulated)
u <- pobs(x)                # pseudo-observations
## Inverting Kendall's tau
fit.tau <- fitCopula(normalCopula(dim=3, dispstr="un"), u, method="itau")
fit.tau
## Inverting Spearman's rho
fit.rho <- fitCopula(normalCopula(dim=3, dispstr="un"), u, method="irho")
fit.rho
## Maximum pseudo-likelihood
fit.mpl <- fitCopula(normalCopula(dim=3, dispstr="un"), u, method="mpl")
summary(fit.mpl)
coef(fit.mpl) # named vector
coef(fit.mpl, SE = TRUE) # the matrix, with SE
## Maximum likelihood (use 'x', not 'u' !)
fit.ml <- fitCopula(normalCopula(dim=3, dispstr="un"), x, method="ml", traceOpt=TRUE)
summary(fit.ml)
confint(fit.ml)
confint(fit.ml, level = 0.999) # clearly non-0

## Fix some of the parameters
param <- c(.6, .3, NA_real_)
attr(param, "fixed") <- c(TRUE, FALSE, FALSE)
ncp <- normalCopula(param = param, dim = 3, dispstr = "un")
fixedParam(ncp) <- c(TRUE, TRUE, FALSE)
## 'traceOpt = 5': showing every 5-th log likelihood evaluation:
summary(Fxf.mpl <- fitCopula(ncp, u, method = "mpl", traceOpt = 5))
Fxf.mpl@copula # reminding of the fixed param. values

## With dispstr = "toep" :
normal.cop.toep <- normalCopula(c(0, 0), dim=3, dispstr="toep")
## Inverting Kendall's tau
fit.tau <- fitCopula(normalCopula(dim=3, dispstr="toep"), u, method="itau")
fit.tau
## Inverting Spearman's rho
fit.rho <- fitCopula(normalCopula(dim=3, dispstr="toep"), u, method="irho")
summary(fit.rho)
## Maximum pseudo-likelihood
fit.mpl <- fitCopula(normalCopula(dim=3, dispstr="toep"), u, method="mpl")
fit.mpl
## Maximum likelihood (use 'x', not 'u' !)
fit.ml <- fitCopula(normalCopula(dim=3, dispstr="toep"), x, method="ml")
summary(fit.ml)

## With dispstr = "ar1"
normal.cop.ar1 <- normalCopula(c(0), dim=3, dispstr="ar1")
## Inverting Kendall's tau
summary(fit.tau <- fitCopula(normalCopula(dim=3, dispstr="ar1"), u, method="itau"))
## Inverting Spearman's rho
summary(fit.rho <- fitCopula(normalCopula(dim=3, dispstr="ar1"), u, method="irho"))
## Maximum pseudo-likelihood
summary(fit.mpl <- fitCopula(normalCopula(dim=3, dispstr="ar1"), u, method="mpl"))
## Maximum likelihood (use 'x', not 'u' !)
fit.ml <- fitCopula(normalCopula(dim=3, dispstr="ar1"), x, method="ml")
summary(fit.ml)

## A t copula with variable df (df.fixed=FALSE)
(tCop <- tCopula(c(0.2,0.4,0.6), dim=3, dispstr="un", df=5))
set.seed(101)
x <- rCopula(n, tCop) # "true" observations (simulated)
## Maximum likelihood (start = (rho[1:3], df))
summary(tc.ml <- fitCopula(tCopula(dim=3, dispstr="un"), x, method="ml",
                           start = c(0,0,0, 10)))
## Maximum pseudo-likelihood (the asymptotic variance cannot be estimated)
u <- pobs(x)          # pseudo-observations
tc.mpl <- fitCopula(tCopula(dim=3, dispstr="un"),
                     u, method="mpl", estimate.variance=FALSE,
                     start= c(0,0,0, 10))
summary(tc.mpl)

[Package copula version 1.1-3 Index]