PCA_TS {HDTSA}R Documentation

Principal component analysis for time serise

Description

PCA_TS() seeks for a contemporaneous linear transformation for a multivariate time series such that the transformed series is segmented into several lower-dimensional subseries:

{\bf y}_t={\bf Ax}_t,

where {\bf x}_t is an unobservable p \times 1 weakly stationary time series consisting of q\ (\geq 1) both contemporaneously and serially uncorrelated subseries. See Chang, Guo and Yao (2018).

Usage

PCA_TS(
  Y,
  lag.k = 5,
  thresh = FALSE,
  tuning.vec = NULL,
  K = 5,
  prewhiten = TRUE,
  permutation = c("max", "fdr"),
  m = NULL,
  beta,
  just4pre = FALSE,
  verbose = FALSE
)

Arguments

Y

{\bf Y} = \{{\bf y}_1, \dots , {\bf y}_n \}', a data matrix with n rows and p columns, where n is the sample size and p is the dimension of {\bf y}_t. The procedure will first normalize {\bf y}_t as \widehat{{\bf V}}^{-1/2}{\bf y}_t, where \widehat{{\bf V}} is an estimator for covariance of {\bf y}_t. See details below for the selection of \widehat{{\bf V}}^{-1}.

lag.k

Time lag k_0 used to calculate the nonnegative definte matrix \widehat{{\bf W}}_y:

\widehat{\mathbf{W}}_y\ =\ \sum_{k=0}^{k_0}\widehat{\mathbf{\Sigma}}_y(k)\widehat{\mathbf{\Sigma}}_y(k)'=\mathbf{I}_p+\sum_{k=1}^{k_0}\widehat{\mathbf{\Sigma}}_y(k)\widehat{\mathbf{\Sigma}}_y(k)',

where \widehat{\bf \Sigma}_y(k) is the sample autocovariance of \widehat{{\bf V}}^{-1/2}{\bf y}_t at lag k. See (2.5) in Chang, Guo and Yao (2018).

thresh

Logical. If FALSE (the default), no thresholding will be applied to estimate \widehat{{\bf W}}_y. If TRUE, a thresholding method will be applied first to estimate \widehat{{\bf W}}_y, see (3.5) in Chang, Guo and Yao (2018).

tuning.vec

The value of the tuning parameter \lambda in the thresholding level u = \lambda \sqrt{n^{-1}\log p}, where default value is 2. If tuning.vec is a vector, then a cross validation method proposed in Cai and Liu (2011) will be used to choose the best tuning parameter \lambda.

K

The number of folders used in the cross validation for the selection of \lambda, the default is 5. It is required when thresh = TRUE.

prewhiten

Logical. If TRUE (the default), we prewhiten each transformed component series of \hat{\bf z}_t [See Section 2.2.1 in Chang, Guo and Yao (2018)] by fitting a univariate AR model with the order between 0 and 5 determined by AIC. If FALSE, then prewhiten procedure will not be performed to \hat{\bf z}_t.

permutation

The method of permutation procedure to assign the components of \hat{\bf z}_t to different groups [See Section 2.2.1 in Chang, Guo and Yao (2018)]. Option is 'max' (Maximum cross correlation method) or 'fdr' (False discovery rate procedure based on multiple tests), default is permutation = 'max'. See Sections 2.2.2 and 2.2.3 in Chang, Guo and Yao (2018) for more information.

m

A positive constant used in the permutation procedure [See (2.10) in Chang, Guo and Yao (2018)]. If m is not specified, then default option is m = 10.

beta

The error rate used in the permutation procedure when permutation = 'fdr'.

just4pre

Logical. If TRUE, the procedure outputs \hat{\bf z}_t, otherwise outputs \hat{\bf x}_t (the permutated version of \hat{\bf z}_t).

verbose

Logical. If TRUE, the main results of the permutation procedure will be output on the console. Otherwise, the result will not be output.

Details

When p>n^{1/2}, the procedure use package clime to estimate the precision matrix \widehat{{\bf V}}^{-1}, otherwise uses function cov() to estimate \widehat{{\bf V}} and calculate its inverse. When p>n^{1/2}, we recommend to use the thresholding method to calculate \widehat{{\bf W}}_y, see more information in Chang, Guo and Yao (2018).

Value

The output of the segment procedure is a list containing the following components:

B

The p\times p transformation matrix such that \hat{\bf z}_t = \widehat{\bf B}{\bf y}_t, where \widehat{\bf B}=\widehat{\bf \Gamma}_y\widehat{{\bf V}}^{-1/2}.

Z

\hat{\bf Z}=\{\hat{\bf z}_1,\dots,\hat{\bf z}_n\}', the transformed series with n rows and p columns.

The output of the permutation procedure is a list containing the following components:

NoGroups

number of groups with at least two components series.

No_of_Members

The cardinalities of different groups.

Groups

The indices of the components in \hat{\bf z}_t that belongs to a group.

method

a character string indicating what method was performed.

References

Chang, J., Guo, B. & Yao, Q. (2018). Principal component analysis for second-order stationary vector time series, The Annals of Statistics, Vol. 46, pp. 2094–2124.

Cai, T. & Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation, Journal of the American Statistical Association, Vol. 106, pp. 672–684.

Cai, T., Liu, W., & Luo, X. (2011). A constrained l1 minimization approach for sparse precision matrix estimation, Journal of the American Statistical Association, Vol. 106, pp. 594–607.

Examples

## Example 1 (Example 5 of Chang Guo and Yao (2018)).
## p=6, x_t consists of 3 independent subseries with 3, 2 and 1 components.

p <- 6;n <- 1500
# Generate x_t
X <- mat.or.vec(p,n)
x <- arima.sim(model=list(ar=c(0.5, 0.3), ma=c(-0.9, 0.3, 1.2,1.3)),
n=n+2,sd=1)
for(i in 1:3) X[i,] <- x[i:(n+i-1)]
x <- arima.sim(model=list(ar=c(0.8,-0.5),ma=c(1,0.8,1.8) ),n=n+1,sd=1)
for(i in 4:5) X[i,] <- x[(i-3):(n+i-4)]
x <- arima.sim(model=list(ar=c(-0.7, -0.5), ma=c(-1, -0.8)),n=n,sd=1)
X[6,] <- x
# Generate y_t
A <- matrix(runif(p*p, -3, 3), ncol=p)
Y <- A%*%X
Y <- t(Y)
res <- PCA_TS(Y, lag.k=5,permutation = "max")
res1=PCA_TS(Y, lag.k=5,permutation = "fdr", beta=10^(-10))
# The transformed series z_t
Z <- res$Z
# Plot the cross correlogram of z_t and y_t
Y <- data.frame(Y);Z=data.frame(Z)
names(Y) <- c("Y1","Y2","Y3","Y4","Y5","Y6")
names(Z) <- c("Z1","Z2","Z3","Z4","Z5","Z6")
# The cross correlogram of y_t shows no block pattern
acfY <- acf(Y)
# The cross correlogram of z_t shows 3-2-1 block pattern
acfZ <- acf(Z)

## Example 2 (Example 6 of Chang Guo and Yao (2018)).
## p=20, x_t consists of 5 independent subseries with 6, 5, 4, 3 and 2 components.
p <- 20;n <- 3000
# Generate x_t
X <- mat.or.vec(p,n)
x <- arima.sim(model=list(ar=c(0.5, 0.3), ma=c(-0.9, 0.3, 1.2,1.3)),n.start=500,
n=n+5,sd=1)
for(i in 1:6) X[i,] <- x[i:(n+i-1)]
x <- arima.sim(model=list(ar=c(-0.4,0.5),ma=c(1,0.8,1.5,1.8)),n.start=500,n=n+4,sd=1)
for(i in 7:11) X[i,] <- x[(i-6):(n+i-7)]
x <- arima.sim(model=list(ar=c(0.85,-0.3),ma=c(1,0.5,1.2)), n.start=500,n=n+3,sd=1)
for(i in 12:15) X[i,] <- x[(i-11):(n+i-12)]
x <- arima.sim(model=list(ar=c(0.8,-0.5),ma=c(1,0.8,1.8)),n.start=500,n=n+2,sd=1)
for(i in 16:18) X[i,] <- x[(i-15):(n+i-16)]
x <- arima.sim(model=list(ar=c(-0.7, -0.5), ma=c(-1, -0.8)),n.start=500,n=n+1,sd=1)
for(i in 19:20) X[i,] <- x[(i-18):(n+i-19)]
# Generate y_t
A <- matrix(runif(p*p, -3, 3), ncol=p)
Y <- A%*%X
Y <- t(Y)
res <- PCA_TS(Y, lag.k=5,permutation = "max")
res1 <- PCA_TS(Y, lag.k=5,permutation = "fdr",beta=10^(-200))
# The transformed series z_t
Z <- res$Z
# Plot the cross correlogram of x_t and y_t
Y <- data.frame(Y);Z <- data.frame(Z)
namesY=NULL;namesZ=NULL
for(i in 1:p)
{
   namesY <- c(namesY,paste0("Y",i))
   namesZ <- c(namesZ,paste0("Z",i))
}
names(Y) <- namesY;names(Z) <- namesZ
# The cross correlogram of y_t shows no block pattern
acfY <- acf(Y, plot=FALSE)
plot(acfY, max.mfrow=6, xlab='', ylab='',  mar=c(1.8,1.3,1.6,0.5),
     oma=c(1,1.2,1.2,1), mgp=c(0.8,0.4,0),cex.main=1)
# The cross correlogram of z_t shows 6-5-4-3-2 block pattern
acfZ <- acf(Z, plot=FALSE)
plot(acfZ, max.mfrow=6, xlab='', ylab='',  mar=c(1.8,1.3,1.6,0.5),
     oma=c(1,1.2,1.2,1), mgp=c(0.8,0.4,0),cex.main=1)
# Identify the permutation mechanism
permutation <- res
permutation$Groups  

[Package HDTSA version 1.0.3 Index]