R: Principal component analysis for time serise

PCA_TS {HDTSA}

R Documentation

Principal component analysis for time serise

Description

PCA_TS() seeks for a contemporaneous linear transformation for a multivariate time series such that the transformed series is segmented into several lower-dimensional subseries:

{\bf y}_t={\bf Ax}_t,

where {\bf x}_t is an unobservable p \times 1 weakly stationary time series consisting of q\ (\geq 1) both contemporaneously and serially uncorrelated subseries. See Chang, Guo and Yao (2018).

Usage

PCA_TS(
  Y,
  lag.k = 5,
  thresh = FALSE,
  tuning.vec = NULL,
  K = 5,
  prewhiten = TRUE,
  permutation = c("max", "fdr"),
  m = NULL,
  beta,
  just4pre = FALSE,
  verbose = FALSE
)

Arguments

`Y`	`{\bf Y} = \{{\bf y}_1, \dots , {\bf y}_n \}'`, a data matrix with `n` rows and `p` columns, where `n` is the sample size and `p` is the dimension of `{\bf y}_t`. The procedure will first normalize `{\bf y}_t` as `\widehat{{\bf V}}^{-1/2}{\bf y}_t`, where `\widehat{{\bf V}}` is an estimator for covariance of `{\bf y}_t`. See details below for the selection of `\widehat{{\bf V}}^{-1}`.
`lag.k`	Time lag `k_0` used to calculate the nonnegative definte matrix `\widehat{{\bf W}}_y`: `\widehat{\mathbf{W}}_y\ =\ \sum_{k=0}^{k_0}\widehat{\mathbf{\Sigma}}_y(k)\widehat{\mathbf{\Sigma}}_y(k)'=\mathbf{I}_p+\sum_{k=1}^{k_0}\widehat{\mathbf{\Sigma}}_y(k)\widehat{\mathbf{\Sigma}}_y(k)',` where `\widehat{\bf \Sigma}_y(k)` is the sample autocovariance of `\widehat{{\bf V}}^{-1/2}{\bf y}_t` at lag `k`. See (2.5) in Chang, Guo and Yao (2018).
`thresh`	Logical. If `FALSE` (the default), no thresholding will be applied to estimate `\widehat{{\bf W}}_y`. If `TRUE`, a thresholding method will be applied first to estimate `\widehat{{\bf W}}_y`, see (3.5) in Chang, Guo and Yao (2018).
`tuning.vec`	The value of the tuning parameter `\lambda` in the thresholding level `u = \lambda \sqrt{n^{-1}\log p}`, where default value is 2. If `tuning.vec` is a vector, then a cross validation method proposed in Cai and Liu (2011) will be used to choose the best tuning parameter `\lambda`.
`K`	The number of folders used in the cross validation for the selection of `\lambda`, the default is 5. It is required when `thresh = TRUE`.
`prewhiten`	Logical. If `TRUE` (the default), we prewhiten each transformed component series of `\hat{\bf z}_t` [See Section 2.2.1 in Chang, Guo and Yao (2018)] by fitting a univariate AR model with the order between 0 and 5 determined by AIC. If `FALSE`, then prewhiten procedure will not be performed to `\hat{\bf z}_t`.
`permutation`	The method of permutation procedure to assign the components of `\hat{\bf z}_t` to different groups [See Section 2.2.1 in Chang, Guo and Yao (2018)]. Option is `'max'` (Maximum cross correlation method) or `'fdr'` (False discovery rate procedure based on multiple tests), default is `permutation = 'max'`. See Sections 2.2.2 and 2.2.3 in Chang, Guo and Yao (2018) for more information.
`m`	A positive constant used in the permutation procedure [See (2.10) in Chang, Guo and Yao (2018)]. If `m` is not specified, then default option is `m =` 10.
`beta`	The error rate used in the permutation procedure when `permutation = 'fdr'`.
`just4pre`	Logical. If `TRUE`, the procedure outputs `\hat{\bf z}_t`, otherwise outputs `\hat{\bf x}_t` (the permutated version of `\hat{\bf z}_t`).
`verbose`	Logical. If `TRUE`, the main results of the permutation procedure will be output on the console. Otherwise, the result will not be output.

Details

When p>n^{1/2}, the procedure use package clime to estimate the precision matrix \widehat{{\bf V}}^{-1}, otherwise uses function cov() to estimate \widehat{{\bf V}} and calculate its inverse. When p>n^{1/2}, we recommend to use the thresholding method to calculate \widehat{{\bf W}}_y, see more information in Chang, Guo and Yao (2018).

Value

The output of the segment procedure is a list containing the following components:

`B`	The `p\times p` transformation matrix such that `\hat{\bf z}_t = \widehat{\bf B}{\bf y}_t`, where `\widehat{\bf B}=\widehat{\bf \Gamma}_y\widehat{{\bf V}}^{-1/2}`.
`Z`	`\hat{\bf Z}=\{\hat{\bf z}_1,\dots,\hat{\bf z}_n\}'`, the transformed series with `n` rows and `p` columns.

The output of the permutation procedure is a list containing the following components:

`NoGroups`	number of groups with at least two components series.
`No_of_Members`	The cardinalities of different groups.
`Groups`	The indices of the components in `\hat{\bf z}_t` that belongs to a group.
`method`	a character string indicating what method was performed.

References

Chang, J., Guo, B. & Yao, Q. (2018). Principal component analysis for second-order stationary vector time series, The Annals of Statistics, Vol. 46, pp. 2094–2124.

Cai, T. & Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation, Journal of the American Statistical Association, Vol. 106, pp. 672–684.

Cai, T., Liu, W., & Luo, X. (2011). A constrained l1 minimization approach for sparse precision matrix estimation, Journal of the American Statistical Association, Vol. 106, pp. 594–607.

Examples

## Example 1 (Example 5 of Chang Guo and Yao (2018)).
## p=6, x_t consists of 3 independent subseries with 3, 2 and 1 components.

p <- 6;n <- 1500
# Generate x_t
X <- mat.or.vec(p,n)
x <- arima.sim(model=list(ar=c(0.5, 0.3), ma=c(-0.9, 0.3, 1.2,1.3)),
n=n+2,sd=1)
for(i in 1:3) X[i,] <- x[i:(n+i-1)]
x <- arima.sim(model=list(ar=c(0.8,-0.5),ma=c(1,0.8,1.8) ),n=n+1,sd=1)
for(i in 4:5) X[i,] <- x[(i-3):(n+i-4)]
x <- arima.sim(model=list(ar=c(-0.7, -0.5), ma=c(-1, -0.8)),n=n,sd=1)
X[6,] <- x
# Generate y_t
A <- matrix(runif(p*p, -3, 3), ncol=p)
Y <- A%*%X
Y <- t(Y)
res <- PCA_TS(Y, lag.k=5,permutation = "max")
res1=PCA_TS(Y, lag.k=5,permutation = "fdr", beta=10^(-10))
# The transformed series z_t
Z <- res$Z
# Plot the cross correlogram of z_t and y_t
Y <- data.frame(Y);Z=data.frame(Z)
names(Y) <- c("Y1","Y2","Y3","Y4","Y5","Y6")
names(Z) <- c("Z1","Z2","Z3","Z4","Z5","Z6")
# The cross correlogram of y_t shows no block pattern
acfY <- acf(Y)
# The cross correlogram of z_t shows 3-2-1 block pattern
acfZ <- acf(Z)

## Example 2 (Example 6 of Chang Guo and Yao (2018)).
## p=20, x_t consists of 5 independent subseries with 6, 5, 4, 3 and 2 components.
p <- 20;n <- 3000
# Generate x_t
X <- mat.or.vec(p,n)
x <- arima.sim(model=list(ar=c(0.5, 0.3), ma=c(-0.9, 0.3, 1.2,1.3)),n.start=500,
n=n+5,sd=1)
for(i in 1:6) X[i,] <- x[i:(n+i-1)]
x <- arima.sim(model=list(ar=c(-0.4,0.5),ma=c(1,0.8,1.5,1.8)),n.start=500,n=n+4,sd=1)
for(i in 7:11) X[i,] <- x[(i-6):(n+i-7)]
x <- arima.sim(model=list(ar=c(0.85,-0.3),ma=c(1,0.5,1.2)), n.start=500,n=n+3,sd=1)
for(i in 12:15) X[i,] <- x[(i-11):(n+i-12)]
x <- arima.sim(model=list(ar=c(0.8,-0.5),ma=c(1,0.8,1.8)),n.start=500,n=n+2,sd=1)
for(i in 16:18) X[i,] <- x[(i-15):(n+i-16)]
x <- arima.sim(model=list(ar=c(-0.7, -0.5), ma=c(-1, -0.8)),n.start=500,n=n+1,sd=1)
for(i in 19:20) X[i,] <- x[(i-18):(n+i-19)]
# Generate y_t
A <- matrix(runif(p*p, -3, 3), ncol=p)
Y <- A%*%X
Y <- t(Y)
res <- PCA_TS(Y, lag.k=5,permutation = "max")
res1 <- PCA_TS(Y, lag.k=5,permutation = "fdr",beta=10^(-200))
# The transformed series z_t
Z <- res$Z
# Plot the cross correlogram of x_t and y_t
Y <- data.frame(Y);Z <- data.frame(Z)
namesY=NULL;namesZ=NULL
for(i in 1:p)
{
   namesY <- c(namesY,paste0("Y",i))
   namesZ <- c(namesZ,paste0("Z",i))
}
names(Y) <- namesY;names(Z) <- namesZ
# The cross correlogram of y_t shows no block pattern
acfY <- acf(Y, plot=FALSE)
plot(acfY, max.mfrow=6, xlab='', ylab='',  mar=c(1.8,1.3,1.6,0.5),
     oma=c(1,1.2,1.2,1), mgp=c(0.8,0.4,0),cex.main=1)
# The cross correlogram of z_t shows 6-5-4-3-2 block pattern
acfZ <- acf(Z, plot=FALSE)
plot(acfZ, max.mfrow=6, xlab='', ylab='',  mar=c(1.8,1.3,1.6,0.5),
     oma=c(1,1.2,1.2,1), mgp=c(0.8,0.4,0),cex.main=1)
# Identify the permutation mechanism
permutation <- res
permutation$Groups

[Package HDTSA version 1.0.3 Index]