🎞️

FTS-5 Cointegration, State Space Models

Cointegration

Definition

Let be a non-stationary multivariate time series in . If there exists such that is stationary, then is called cointegrated. is called the cointegrating vector.
If are linearly independent vectors in such that is stationary for each , then (the largest possible) is called the cointegrating rank.
The linear space spanned by is called the cointegrating space.
A typical normalization is to set one of the coefficients to be 1, i.e., take such that
where is a stationary process. The error term is often referred to as the disequilibrium error or the cointegrating residual.
Note that Cointegration and correlation do not have much to do with each other.

Presence of Cointegration

Granger-Engle Method
Represent one time series as a linear combination of the others:
Fit using least squares and test whether the residuals are stationary using Dickey-Fuller test. If the residuals are stationary, then the cointegrating vector is . There can be several cointegrating vectors.
The method is available in statsmodels.tsa.stattools.coint .
Johansen Method
Recall that a vector is cointegrated if is non-stationary, but is stationary for some matrix with rank .
Johansen’s test starts by modeling by a VAR process with a unit root:
where for some .
If there is a such that , then the process is explosive, which is not practical. Similarly to how we obtained the Dickey-Fuller test, we get
where
This is called a vector error-correction model (VECM).
If is stationary, then is also a stationary process and contains the co-integrating vectors.
If the process has unit roots, assuming , we have is a singular matrix. If is singular, then its rank is .
If , then and is not co-integrating, because is a stationary process.
If , then has linearly independent cointegrating vectors.
To test cointegration, i.e., to test if , Johansen’s method creates a nested sequence of hypotheses
where refers to the null hypothesis of .
Using these nested hypotheses, Johansen’s method sequentially tests .
Johansen’s method can be based on two different test statistics:
  • trace/likelihood statistic
  • maximum eigenvalue statistic
The method is available in statsmodels.tsa.vector_ar.vecm.coint_johansen.
Trace Statistic
With estimated eigenvalues of , the trace or likelihood test statistic for is
If is true, then should all be close to zero which implies should be small.
The asymptotic null distribution is not chi-square but a multivariate version of the Dickey-Fuller test. Johansen proposes a sequential testing procedure that consistently determines the number of integrating vectors: First test v.s. . If this null is not rejected, then conclude that there are no co-integrating vectors. If rejected, then test v.s. , and so on.
Maximum Eigenvalue Statistic
Instead of testing : v.s. : . There is an alternative hypothesis .
The likelihood test statistic for this stricter alternative hypothesis is the maximum eigenvalue statistic given by
As with the trace statistic, the asymptotic null distribution is not chi-square but instead is a complicated function of Brownian motion.

State Space Models

Definition

The models arose in spacecraft tracking, where the state equation defines the motion equations for the position or state of a spacecraft with location and the data reflect information that can be observed form the tracking device.
A simple example is
for independent stationary processes and . is the true location and/or trajectory of the spacecraft and is the observed location blurred with noise .
An extremely rich class of models for time series, including and going well beyond the linear ARIMA can be represented as state-space models. Furthermore, instead of modeling trend, and seasonality, as non-stochastic functions, state-space models can allow them to evolve stochastically. State-space model representation of classical models allows us to make use of Kalman recursions in iterative estimation/prediction of classical models.
A state-space model for a (possibly multivariate) time series consists of two equations. The first known as the observation equation represents observations vector as a linear function of state vectors up to some noise:
Here and can be of different dimensions.
The second part of the state-space model is the state equation determines how the state sequence evolves:
We further assume that is uncorrelated with , i.e.
, the initial state, is also assumed uncorrelated of these noise processes. State-space models are also referred to as dynamic linear models. In many important cases, the matrices are independent of .
By recursively writing out the equations, we can represent as a linear function of , and as a linear function of .
A time series is said to have a state-space representation if there exists a state-space model for as specified by an observation equation and a state equation. For the state-space models, it is clear that neither nor is necessarily stationary.
Note that if are independent, then satisfies Markov property: the distribution of given is the same as the distribution of given .

Classical models as state-space models

State-space representation: AR(p)
Consider the process
where is .
Write
Then and
where and
State-space representation: MA(q)
Let denote the process, i.e. . Write the observation equation as
The state equation is
process can also be represented in the same way. For example, the process is
where
State-space representation: non-uniqueness
State-space representations are not unique in general. If
then for any orthogonal matrix
Hence, and also satisfy the state-space model, with the same covariance structures. This means that we cannot identiy the state sequence uniquely from the observed data. The choice is often made based on the interpretation of the state variables and the relationship to the underlying parameters.
State-space representation: ARMA
There are several options for the ARMA model, one called Hamilton’s representation. This representation relies on the fact that lagged sum of process is an ARMA process. For example, if is , then is an ARMA process. In the terms of the backshift operator, suppose is an with . Define as an process via . Then because
Hence, with , we can write
where for and
Note that is an process which can be written as a process: with for , , where
Summarizing, we get a state-space representation of process as

Kalman Filter

Prediction / forecasting: Estimate or using . This is denoted by .
Filtering: Estimate using . This is denoted by .
Smoothing: Estimate for using . This is denoted by .
All three are solved by Kalman recursions and these recursions yield a minimum mean squared error. Moreover, they update estimates / predictions sequentially without requiring complete processing.
Additional assumption: the noise series and are independent normal random vectors. This is much stronger than the white noise assumption. Two facts we need for Kalman recursions:
  • If , then .
  • Suppose
    • then
Recall that we only observe and as we update observations recursively, let denote the information available up to time :
Suppose we have a prediction based on such that
then serves as an estimate of the variability in given the information . Note that we have , so we can take and .
Because and independently on , we get
This implies
Equivalently, , where
Because , we can write
and because is independent of , we have
Β 
Therefore, with , we have
Now using the conditional distribution formula, we get
where
Note that involves the conditional covariance of and . If this conditional covariance is β€œhigher”, then mistakes in the prediction of have a bigger impact.
Equivalently, .
Set and . Let , let
  • best guess of given
  • best guess of given
  • best guess of given
  • best guess of given
Recall and . This implies
The main recursion is when we observe a new observation , then

Application: Linear Regression

Consider a simple linear regression model
are known and fixed and .
This can be expressed as a state-space model:
where and .
We have and
Recall the state recursion equation is
Because , we get , and hence
This gives a simple online implementation of linear regression. The recursion can be simplified by noting that
This implies
Hence, the recursion becomes
In simple linear regression, we know the optimal estimator is OLS
This can be written as
Note the only difference is the additional 1+ in the denominator which is negligible asymptotically.
Β 

Loading Comments...