MARSSresiduals.tT {MARSS} | R Documentation |
MARSS Smoothed Residuals
Description
Calculates the standardized (or auxiliary) smoothed residuals sensu Harvey, Koopman and Penzer (1998). The expected values and variance for missing (or left-out) data are also returned (Holmes 2014). Not exported. Access this function with MARSSresiduals(object, type="tT")
. At time (in the returned matrices), the model residuals are for time
, while the state residuals are for the transition from
to
following the convention in Harvey, Koopman and Penzer (1998).
Usage
MARSSresiduals.tT(object, Harvey = FALSE, normalize = FALSE,
silent = FALSE, fun.kf = c("MARSSkfas", "MARSSkfss"))
Arguments
object |
An object of class |
Harvey |
TRUE/FALSE. Use the Harvey et al. (1998) algorithm or use the Holmes (2014) algorithm. The values are the same except for missing values. |
normalize |
TRUE/FALSE See details. |
silent |
If TRUE, don't print inversion warnings. |
fun.kf |
Kalman filter function to use. Can be ignored. |
Details
This function returns the raw, the Cholesky standardized and the marginal standardized smoothed model and state residuals. 'smoothed' means conditioned on all the observed data and a set of parameters. These are the residuals presented in Harvey, Koopman and Penzer (1998) pages 112-113, with the addition of the values for unobserved data (Holmes 2014). If Harvey=TRUE, the function uses the algorithm on page 112 of Harvey, Koopman and Penzer (1998) to compute the conditional residuals and variance of the residuals. If Harvey=FALSE, the function uses the equations in the technical report (Holmes 2014). Unlike the innovations residuals, the smoothed residuals are autocorrelated (section 4.1 in Harvey and Koopman 1992) and thus an ACF test on these residuals would not reveal model inadequacy.
The residuals matrix has a value for each time step. The residuals in column rows 1 to n are the model residuals associated with the data at time
. The residuals in rows n+1 to n+m are the state residuals associated with the transition from
to
, not the transition from
to
. Because
does not exist at time
, the state residuals and associated variances at time
are NA.
Below the conditional residuals and their variance are discussed. The random variables are capitalized and the realizations from the random variables are lower case. The random variables are ,
,
and
. There are two types of
. The observed
that are used to estimate the states
. These are termed
. The unobserved
are termed
. These are not used to estimate the states
and we may or may not know the values of
. Typically we treat
as unknown but it may be known but we did not include it in our model fitting. Note that the model parameters
are treated as fixed or known. The 'fitting' does not involve estimating
; it involves estimating
. All MARSS parameters can be time varying but the
subscripts are left off parameters to reduce clutter.
Model residuals
is the difference between the data and the predicted data at time
given
:
is unknown (hidden) and our data are one realization of
. The observed model residuals
are the difference between the observed data and the predicted data at time
using the fitted model.
MARSSresiduals.tT
fits the model using all the data, thus
where is the expected value of
conditioned on the data from 1 to
(all the data), i.e. the Kalman smoother estimate of the states at time
.
are your data and missing values will appear as NA in the observed model residuals. These are returned as
model.residuals
and rows 1 to of
residuals
.
res1
and res2
in the code below will be the same.
dat = t(harborSeal)[2:3,] fit = MARSS(dat) Z = coef(fit, type="matrix")$Z A = coef(fit, type="matrix")$A res1 = dat - Z %*% fit$states - A %*% matrix(1,1,ncol(dat)) res2 = MARSSresiduals(fit, type="tT")$model.residuals
State residuals
are the difference between the state at time
and the expected value of the state at time
given the state at time
:
The estimated state residuals are the difference between estimate of
minus the estimate using
.
where is the Kalman smoother estimate of the states at time
and
is the Kalman smoother estimate of the states at time
.
The estimated state residuals
are returned in
state.residuals
and rows to
of
residuals
. state.residuals[,t]
is (notice time subscript difference). There are no NAs in the estimated state residuals as an estimate of the state exists whether or not there are associated data.
res1
and res2
in the code below will be the same.
dat <- t(harborSeal)[2:3,] TT <- ncol(dat) fit <- MARSS(dat) B <- coef(fit, type="matrix")$B U <- coef(fit, type="matrix")$U statestp1 <- MARSSkf(fit)$xtT[,2:TT] statest <- MARSSkf(fit)$xtT[,1:(TT-1)] res1 <- statestp1 - B %*% statest - U %*% matrix(1,1,TT-1) res2 <- MARSSresiduals(fit, type="tT")$state.residuals[,1:(TT-1)]
Note that the state residual at the last time step (not shown) will be NA because it is the residual associated with to
and
is beyond the data. Similarly, the variance matrix at the last time step will have NAs for the same reason.
Variance of the residuals
In a state-space model, and
are stochastic, and the model and state residuals are random variables
and
. To evaluate the residuals we observed (with
), we use the joint distribution of
across all the different possible data sets that our MARSS equations with parameters
might generate. Denote the matrix of
, as
. That distribution has an expected value (mean) and variance:
Our observed residuals (returned in residuals
) are one sample from this distribution.
To standardize the observed residuals, we will use .
is returned in
var.residuals
. Rows/columns 1 to are the conditional variances of the model residuals and rows/columns
to
are the conditional variances of the state residuals. The off-diagonal blocks are the covariances between the two types of residuals.
Standardized residuals
MARSSresiduals
will return the Cholesky standardized residuals sensu Harvey et al. (1998) in std.residuals
for outlier and shock detection. These are the model and state residuals multiplied by the inverse of the lower triangle of the Cholesky decomposition of var.residuals
(note chol()
in R returns the upper triangle thus a transpose is needed). The standardized model residuals are set to NA when there are missing data. The standardized state residuals however always exist since the expected value of the states exist without data. The calculation of the standardized residuals for both the observations and states requires the full residuals variance matrix. Since the state residuals variance is NA at the last time step, the standardized residual in the last time step will be all NA (for both model and state residuals).
The interpretation of the Cholesky standardized residuals is not straight-forward when the and
variance-covariance matrices are non-diagonal. The residuals which were generated by a non-diagonal variance-covariance matrices are transformed into orthogonal residuals in
space. For example, if v is 2x2 correlated errors with variance-covariance matrix R. The transformed residuals (from this function) for the i-th row of
is a combination of the row 1 effect and the row 1 effect plus the row 2 effect. So in this case, row 2 of the transformed residuals would not be regarded as solely the row 2 residual but rather how different row 2 is from row 1, relative to expected. If the errors are highly correlated, then the transformed residuals can look rather non-intuitive.
The marginal standardized residuals are returned in mar.residuals
. These are the model and state residuals multiplied by the inverse of the diagonal matrix formed by the square root of the diagonal of var.residuals
. These residuals will be correlated (across the residuals at time ) but are easier to interpret when
and
are non-diagonal.
The Block Cholesky standardized residuals are like the Cholesky standardized residuals except that the full variance-covariance matrix is not used, only the variance-covariance matrix for the model or state residuals (respectively) is used for standardization. For the model residuals, the Block Cholesky standardized residuals will be the same as the Cholesky standardized residuals because the upper triangle of the lower triangle of the Cholesky decomposition (which is what we standardize by) is all zero. For the state residuals, the Block Cholesky standardization will be different because Block Cholesky standardization treats the model and state residuals as independent (which they are not in the smoothations case).
Normalized residuals
If normalize=FALSE
, the unconditional variance of and
are
and
and the model is assumed to be written as
If normalize=TRUE, the model is assumed to be written
with the variance of and
equal to
(identity).
MARSSresiduals.tT
returns the residuals defined as in the first equations. To get the residuals defined as Harvey et al. (1998) define them (second equations), then use normalize=TRUE
. In that case the unconditional variance of residuals will be instead of
and
.
Missing or left-out data
and
are for the distribution across all possible
and
. We can also compute the expected value and variance conditioned on a specific value of
, the one we observed
(Holmes 2014). If there are no missing values, this is not very interesting as
and
. If we have data that are missing because we left them out, however,
and
are the values we need to evaluate whether the left-out data are unusual relative to what you expect given the data you did collect.
E.obs.residuals
is the conditional expected value (notice small
). It is
It is similar to . The difference is the
term.
is
for the non-missing values. For the missing values, the value depends on
. If
is diagonal,
is
and the expected residual value is 0. If
is non-diagonal however, it will be non-zero.
var.obs.residuals
is the conditional variance (eqn 24 in Holmes (2014)). For the non-missing values, this variance is 0 since
is a fixed value. For the missing values,
is not fixed because
is a random variable. For these values, the variance of
is determined by the variance of
conditioned on
. This variance matrix is returned in
var.obs.residuals
. The variance of is 0 and thus is not included.
The variance (uppercase
) returned in the 1 to
rows/columns of
var.residuals
may also be of interest depending on what you are investigating with regards to missing values. For example, it may be of interest in a simulation study or cases where you have multiple replicated data sets.
var.residuals
would allow you to determine if the left-out residuals are unusual with regards to what you would expect for left-out data in that location of the matrix but not specifically relative to the data you did collect. If
is non-diagonal and the
and
are highly correlated, the variance of
and variance of
for the left-out data would be quite different. In the latter, the variance is low because
has strong information about
. In the former, we integrate over
and the variance could be high (depending on the parameters).
Note, if Harvey=TRUE
then the rows and columns of var.residuals
corresponding to missing values will be NA. This is because the Harvey et al. algorithm does not compute the residual variance for missing values.
Value
A list with the following components
model.residuals |
The the observed smoothed model residuals: data minus the model predictions conditioned on all observed data. This is different than the Kalman filter innovations which use on the data up to time |
state.residuals |
The smoothed state residuals |
residuals |
The residuals conditioned on the observed data. Returned as a (n+m) x T matrix with |
var.residuals |
The joint variance of the model and state residuals conditioned on observed data. Returned as a (n+m) x (n+m) x T matrix. For Harvey=FALSE, this is Holmes (2014) equation 57. For Harvey=TRUE, this is the residual variance in eqn. 24, page 113, in Harvey et al. (1998). They are identical except for missing values, for those Harvey=TRUE returns 0s. For the state residual variance, the last time step will be all NA because the last step would be for T to T+1 (past the end of the data). |
std.residuals |
The Cholesky standardized residuals as a (n+m) x T matrix. This is |
mar.residuals |
The marginal standardized residuals as a (n+m) x T matrix. This is |
bchol.residuals |
The Block Cholesky standardized residuals as a (n+m) x T matrix. This is |
E.obs.residuals |
The expected value of the model residuals conditioned on the observed data. Returned as a n x T matrix. For observed data, this will be the observed residuals (values in |
var.obs.residuals |
The variance of the model residuals conditioned on the observed data. Returned as a n x n x T matrix. For observed data, this will be 0. See details. |
msg |
Any warning messages. This will be printed unless Object$control$trace = -1 (suppress all error messages). |
Author(s)
Eli Holmes, NOAA, Seattle, USA.
References
Harvey, A., S. J. Koopman, and J. Penzer. 1998. Messy time series: a unified approach. Advances in Econometrics 13: 103-144 (see page 112-113). Equation 21 is the Kalman eqns. Eqn 23 and 24 is the backward recursion to compute the smoothations. This function uses the MARSSkf output for eqn 21 and then implements the backwards recursion in equation 23 and equation 24. Pages 120-134 discuss the use of standardized residuals for outlier and structural break detection.
de Jong, P. and J. Penzer. 1998. Diagnosing shocks in time series. Journal of the American Statistical Association 93: 796-806. This one shows the same equations; see eqn 6. This paper mentions the scaling based on the inverse of the sqrt (Cholesky decomposition) of the variance-covariance matrix for the residuals (model and state together). This is in the right column, half-way down on page 800.
Koopman, S. J., N. Shephard, and J. A. Doornik. 1999. Statistical algorithms for models in state space using SsfPack 2.2. Econometrics Journal 2: 113-166. (see pages 147-148).
Harvey, A. and S. J. Koopman. 1992. Diagnostic checking of unobserved-components time series models. Journal of Business & Economic Statistics 4: 377-389.
Holmes, E. E. 2014. Computation of standardized residuals for (MARSS) models. Technical Report. arXiv:1411.0045.
See Also
MARSSresiduals()
, MARSSresiduals.tt1()
, fitted.marssMLE()
, plot.marssMLE()
Examples
dat <- t(harborSeal)
dat <- dat[c(2,11),]
fit <- MARSS(dat)
#state residuals
state.resids1 <- MARSSresiduals(fit, type="tT")$state.residuals
#this is the same as hatx_t-(hatx_{t-1}+u)
states <- fit$states
state.resids2 <- states[,2:30]-states[,1:29]-matrix(coef(fit,type="matrix")$U,2,29)
#compare the two
cbind(t(state.resids1[,-30]), t(state.resids2))
#normalize the state residuals to a variance of 1
Q <- coef(fit,type="matrix")$Q
state.resids1 <- MARSSresiduals(fit, type="tT", normalize=TRUE)$state.residuals
state.resids2 <- (solve(t(chol(Q))) %*% state.resids2)
cbind(t(state.resids1[,-30]), t(state.resids2))
#Cholesky standardized (by joint variance) model & state residuals
MARSSresiduals(fit, type="tT")$std.residuals
# Returns residuals in a data frame in long form
residuals(fit, type="tT")