tcor {timevarcorr}R Documentation

Compute time varying correlation coefficients

Description

The function tcor() implements (together with its helper function calc_rho()) the nonparametric estimation of the time varying correlation coefficient proposed by Choi & Shin (2021). The general idea is to compute a (Pearson) correlation coefficient (r(x,y) = \frac{\hat{xy} - \hat{x}\times\hat{y}}{ \sqrt{\hat{x^2}-\hat{x}^2} \times \sqrt{\hat{y^2}-\hat{y}^2}}), but instead of using the means required for such a computation, each component (i.e., x, y, x^2, y^2, x \times y) is smoothed and the smoothed terms are considered in place the original means. The intensity of the smoothing depends on a unique parameter: the bandwidth (h). If h = Inf, the method produces the original (i.e., time-invariant) correlation value. The smaller the parameter h, the more variation in time is being captured. The parameter h can be provided by the user; otherwise it is automatically estimated by the internal helper functions select_h() and calc_RMSE() (see Details).

Usage

tcor(
  x,
  y,
  t = seq_along(x),
  h = NULL,
  cor.method = c("pearson", "spearman"),
  kernel = c("epanechnikov", "box", "normal"),
  CI = FALSE,
  CI.level = 0.95,
  param_smoother = list(),
  keep.missing = FALSE,
  verbose = FALSE
)

calc_rho(
  x,
  y,
  t = seq_along(x),
  t.for.pred = t,
  h,
  cor.method = c("pearson", "spearman"),
  kernel = c("epanechnikov", "box", "normal"),
  param_smoother = list()
)

calc_RMSE(
  h,
  x,
  y,
  t = seq_along(x),
  cor.method = c("pearson", "spearman"),
  kernel = c("epanechnikov", "box", "normal"),
  param_smoother = list(),
  verbose = FALSE
)

select_h(
  x,
  y,
  t = seq_along(x),
  cor.method = c("pearson", "spearman"),
  kernel = c("epanechnikov", "box", "normal"),
  param_smoother = list(),
  verbose = FALSE
)

Arguments

x

a numeric vector.

y

a numeric vector of to be correlated with x.

t

a (numeric or Date) vector of time points. If missing, observations are considered to correspond to sequential time steps (i.e., 1, 2 ...).

h

a scalar indicating the bandwidth used by the smoothing function.

cor.method

a character string indicating which correlation coefficient is to be computed ("pearson", the default; or "spearman").

kernel

a character string indicating which kernel to use: "epanechnikov" (the default), "box", or "normal" (abbreviations also work).

CI

a logical specifying if a confidence interval should be computed or not (default = FALSE).

CI.level

a scalar defining the level for CI (default = 0.95 for 95% CI).

param_smoother

a list of additional parameters to provide to the internal smoothing function (see Details).

keep.missing

a logical specifying if time points associated with missing information should be kept in the output (default = FALSE to facilitate plotting).

verbose

a logical specifying if information should be displayed to monitor the progress of the cross validation (default = FALSE).

t.for.pred

a (numeric or Date) vector of time points at which to evaluate the smoothed fit. If missing, t is used.

Details

Value

—Output for tcor()

A 2 x t dataframe containing:

Or, if CI = TRUE, a 5 x t dataframe containing:

Some metadata are also attached to the dataframe (as attributes):

—Output for calc_rho()

A 14 x t dataframe with:

—Output for calc_RMSE()

A scalar of class numeric corresponding to the RMSE.

—Output for select_h()

A list with the following components:

Functions

References

Choi, JE., Shin, D.W. Nonparametric estimation of time varying correlation coefficient. J. Korean Stat. Soc. 50, 333–353 (2021). doi:10.1007/s42952-020-00073-6

See Also

test_equality, kern_smooth, CI

Examples


#####################################################
## Examples for the user-level function to be used ##
#####################################################

## Effect of the bandwidth

res_h50   <- with(stockprice, tcor(x = SP500, y = FTSE100, t = DateID, h = 50))
res_h100  <- with(stockprice, tcor(x = SP500, y = FTSE100, t = DateID, h = 100))
res_h200  <- with(stockprice, tcor(x = SP500, y = FTSE100, t = DateID, h = 200))
plot(res_h50, type = "l", ylab = "Cor", xlab = "Time", las = 1, col = "grey")
points(res_h100, type = "l", col = "blue")
points(res_h200, type = "l", col = "red")
legend("bottom", horiz = TRUE, fill = c("grey", "blue", "red"),
       legend = c("50", "100", "200"), bty = "n", title = "Bandwidth (h)")


## Effect of the correlation method

res_pearson  <- with(stockprice, tcor(x = SP500, y = FTSE100, t = DateID, h = 150))
res_spearman <- with(stockprice, tcor(x = SP500, y = FTSE100, t = DateID, h = 150,
                                      cor.method = "spearman"))
plot(res_pearson, type = "l", ylab = "Cor", xlab = "Time", las = 1)
points(res_spearman, type = "l", col = "blue")
legend("bottom", horiz = TRUE, fill = c("black", "blue"),
       legend = c("pearson", "spearman"), bty = "n", title = "cor.method")


## Infinite bandwidth should match fixed correlation coefficients
## nb: `h = Inf` is not supported by default kernel (`kernel = 'epanechnikov'`)

res_pearson_hInf  <- with(stockprice, tcor(x = SP500, y = FTSE100, t = DateID, h = Inf,
                                           kernel = "normal"))
res_spearman_hInf <- with(stockprice, tcor(x = SP500, y = FTSE100, t = DateID, h = Inf,
                                           kernel = "normal", cor.method = "spearman"))
r <- cor(stockprice$SP500, stockprice$FTSE100, use = "pairwise.complete.obs")
rho <- cor(stockprice$SP500, stockprice$FTSE100, method = "spearman", use = "pairwise.complete.obs")
round(unique(res_pearson_hInf$r) - r, digits = 3) ## 0 indicates near equality
round(unique(res_spearman_hInf$r) - rho, digits = 3) ## 0 indicates near equality


## Computing and plotting the confidence interval

res_withCI <- with(stockprice, tcor(x = SP500, y = FTSE100, t = DateID, h = 200, CI = TRUE))
with(res_withCI, {
     plot(r ~ t, type = "l", ylab = "Cor", xlab = "Time", las = 1, ylim = c(0, 1))
     points(lwr ~ t, type = "l", lty = 2)
     points(upr ~ t, type = "l", lty = 2)})


## Same using tidyverse packages (dplyr and ggplot2 must be installed)
## see https://github.com/courtiol/timevarcorr for more examples of this kind

if (require("dplyr", warn.conflicts = FALSE)) {

  stockprice |>
    reframe(tcor(x = SP500, y = FTSE100, t = DateID,
                 h = 200, CI = TRUE)) -> res_tidy
  res_tidy
}

if (require("ggplot2")) {

  ggplot(res_tidy) +
     aes(x = t, y = r, ymin = lwr, ymax = upr) +
     geom_ribbon(fill = "grey") +
     geom_line() +
     labs(title = "SP500 vs FTSE100", x = "Time", y = "Correlation") +
     theme_classic()

}


## Automatic selection of the bandwidth using parallel processing and comparison
## of the 3 alternative kernels on the first 500 time points of the dataset
# nb: takes a few seconds to run, so not run by default

run <- in_pkgdown() || FALSE ## change to TRUE to run the example
if (run) {

options("mc.cores" = 2L) ## CPU cores to be used for parallel processing

res_hauto_epanech <- with(stockprice[1:500, ],
         tcor(x = SP500, y = FTSE100, t = DateID, kernel = "epanechnikov")
         )

res_hauto_box <- with(stockprice[1:500, ],
          tcor(x = SP500, y = FTSE100, t = DateID, kernel = "box")
          )

res_hauto_norm <- with(stockprice[1:500, ],
          tcor(x = SP500, y = FTSE100, t = DateID, kernel = "norm")
          )

plot(res_hauto_epanech, type = "l", col = "red",
     ylab = "Cor", xlab = "Time", las = 1, ylim = c(0, 1))
points(res_hauto_box, type = "l", col = "grey")
points(res_hauto_norm, type = "l", col = "orange")
legend("top", horiz = TRUE, fill = c("red", "grey", "orange"),
       legend = c("epanechnikov", "box", "normal"), bty = "n",
       title = "Kernel")

}


## Comparison of the 3 alternative kernels under same bandwidth
## nb: it requires to have run the previous example

if (run) {

res_epanech <- with(stockprice[1:500, ],
          tcor(x = SP500, y = FTSE100, t = DateID,
          kernel = "epanechnikov", h = attr(res_hauto_epanech, "h"))
          )

res_box <- with(stockprice[1:500, ],
           tcor(x = SP500, y = FTSE100, t = DateID,
           kernel = "box", h = attr(res_hauto_epanech, "h"))
           )

res_norm <- with(stockprice[1:500, ],
          tcor(x = SP500, y = FTSE100, t = DateID,
          kernel = "norm", h = attr(res_hauto_epanech, "h"))
          )

plot(res_epanech, type = "l", col = "red", ylab = "Cor", xlab = "Time",
     las = 1, ylim = c(0, 1))
points(res_box, type = "l", col = "grey")
points(res_norm, type = "l", col = "orange")
legend("top", horiz = TRUE, fill = c("red", "grey", "orange"),
       legend = c("epanechnikov", "box", "normal"), bty = "n",
       title = "Kernel")

}

## Automatic selection of the bandwidth using parallel processing with CI
# nb: takes a few seconds to run, so not run by default

run <- in_pkgdown() || FALSE ## change to TRUE to run the example
if (run) {

res_hauto_epanechCI <- with(stockprice[1:500, ],
          tcor(x = SP500, y = FTSE100, t = DateID, CI = TRUE)
          )

plot(res_hauto_epanechCI[, c("t", "r")], type = "l", col = "red",
     ylab = "Cor", xlab = "Time", las = 1, ylim = c(0, 1))
points(res_hauto_epanechCI[, c("t", "lwr")], type = "l", col = "red", lty = 2)
points(res_hauto_epanechCI[, c("t", "upr")], type = "l", col = "red", lty = 2)

}


## Not all kernels work well in all situations
## Here the default kernell estimation leads to issues for last time points
## nb1: EuStockMarkets is a time-series object provided with R
## nb2: takes a few minutes to run, so not run by default

run <- in_pkgdown() || FALSE ## change to TRUE to run the example
if (run) {

EuStock_epanech <- tcor(EuStockMarkets[1:500, "DAX"], EuStockMarkets[1:500, "SMI"])
EuStock_norm <- tcor(EuStockMarkets[1:500, "DAX"], EuStockMarkets[1:500, "SMI"], kernel = "normal")

plot(EuStock_epanech, type = "l", col = "red", las = 1, ylim = c(-1, 1))
points(EuStock_norm, type = "l", col = "orange", lty = 2)
legend("bottom", horiz = TRUE, fill = c("red", "orange"),
       legend = c("epanechnikov", "normal"), bty = "n",
       title = "Kernel")
}




##################################################################
## Examples for the internal function computing the correlation ##
##################################################################

## Computing the correlation and its component for the first six time points

with(head(stockprice), calc_rho(x = SP500, y = FTSE100, t = DateID, h = 20))


## Predicting the correlation and its component at a specific time point

with(head(stockprice), calc_rho(x = SP500, y = FTSE100, t = DateID, h = 20,
     t.for.pred = DateID[1]))


## The function can handle non consecutive time points

set.seed(1)
calc_rho(x = rnorm(10), y = rnorm(10), t = c(1:5, 26:30), h = 3, kernel = "box")


## The function can handle non-ordered time series

with(head(stockprice)[c(1, 3, 6, 2, 4, 5), ], calc_rho(x = SP500, y = FTSE100, t = DateID, h = 20))


## Note: the function does not handle missing data (by design)

# calc_rho(x = c(NA, rnorm(9)), y = rnorm(10), t = c(1:2, 23:30), h = 2) ## should err (if ran)



###########################################################
## Examples for the internal function computing the RMSE ##
###########################################################

## Compute the RMSE on the correlation estimate
# nb: takes a few seconds to run, so not run by default

run <- in_pkgdown() || FALSE ## change to TRUE to run the example
if (run) {

small_clean_dataset <- head(na.omit(stockprice), n = 200)
with(small_clean_dataset, calc_RMSE(x = SP500, y = FTSE100, t = DateID, h = 10))

}




################################################################
## Examples for the internal function selecting the bandwidth ##
################################################################

## Automatic selection of the bandwidth using parallel processing
# nb: takes a few seconds to run, so not run by default

run <- in_pkgdown() || FALSE ## change to TRUE to run the example
if (run) {

small_clean_dataset <- head(na.omit(stockprice), n = 200)
with(small_clean_dataset, select_h(x = SP500, y = FTSE100, t = DateID))

}


[Package timevarcorr version 0.1.1 Index]