Hannan-Rissanen-Kavalieris Estimation

Differences to HRK in HD

The Hannan-Rissanen-Kavalieris algorithm implemented in this package differs in some ways from the one described in Hannan, Deistler 2012: The Statistical Theory of Linear Systems (henceforth HD) in Chapter 6.7, starting on page 294ff:

HRK in HD uses Toeplitz and recursive calculations, i.e. at every stage, models for all orders (up to an upper bound) \(p,q\) are calculated. We use OLS for estimating one particular model with AR and MA orders \(p\) and \(q\).
HRK in HD is in Kronecker echelon form and only treats the generic neighborhood (Kronecker indices are such that \(n_1 = \cdots = n_k = n_{k+1} + 1 = \cdots = n_m + 1)\), where \(m\) is the output dimension. We allow for affine restrictions which are more general. However, it is still required that the \(a_0 = b_0\), \(diag(a_0)=I_m\), and that \(\Sigma_L\) is lower-triangular.

Stage 3

Inputs and Outputs

Given are

the data \(y\) containing \(m\) variables, \(T\) number of observations, and stored in a matrix of dimension \((T \times m)\)
the residuals \(\varepsilon_t^{II}\) obtained from stage 2, i.e. and obtained from \(b_{II,f}(z)^{-1} a_{II}(z) y_t\) where all the zeros of \(b_{II,f}(z)\) have been flipped outside the unit circle. Note that \(a(z)\) might still have zeros inside the unit circle!
The estimates \(a_{II}(z)\) and \(b_{II,f}(z)\) from above.

The outputs of the algorithm are

\(a_{III}(z)\), \(b_{III,f}(z)\) (obtained from \(b_{III}(z)\) by potentially flipping zeros from inside to outside the unit circle).
- \(a_{III}(z)\) and \(b_{III}(z)\) are obtained from an OLS regression. As side-product, we also obtain the “ARX” residuals \(u^{III}_t\)
Residuals \(\varepsilon_t^{III}\). These are obtained from \(b_{III,f}(z)^{-1} a_{III}(z) y_t\). Again, it is not guaranteed that \(a_{III}(z)\) has no zeros inside the unit circle!

Optimization Criterion

The goal is to minimize

\[ \int_{-\pi}^{\pi} trace\left[ \left( k(e^{i\omega}) \Sigma k'(e^{-i\omega}) \right)^{-1} \left( w(\omega) w^*(\omega)\right)\right] d\omega \] in the respective parameter space, where \[ w(\omega) = \left( \frac{1}{\sqrt{T}} \sum_{t=1}^T y_t e^{it\omega_j} \right) \] Similar to the univariate case, we will first order approximate the transfer function in \[ \left( k(e^{i\omega}) \Sigma k'(e^{-i\omega}) \right)^{-1} \] around the initial value obtained from stage 2, i.e. \[ \left[ k(e^{i\omega})^{-*} \Sigma^{-\frac{1}{2}} \right] \Sigma^{-\frac{1}{2}} \left( \hat{k}(e^{i\omega})^{-1} + \frac{\partial \hat{k}(e^{i\omega})^{-1}}{\partial \tau'} \left( \tau - \hat{\tau}\right) \right) \] In the following, we will derive the regression obtained from the terms in the parentheses in the parametrisation where \(\tau = vec\left(a_1, \ldots, a_p, b_1, \ldots, b_q \right)\). The term \(\frac{\partial \hat{k}(e^{-i\omega})}{\partial \tau'}\) means the partial derivative w.r.t. \(\tau\) evaluated at \(\hat{\tau}\). (We restrict ourselves to the VARMA(p,q) model but, of course, it is applicable given the affine restrictions described above.)

We need to regress \[ \left[ \hat{k}(z)^{-1} - \frac{\partial \hat{k}(z)^{-1}}{\partial \tau'} \hat{\tau} \right] y_t \] on \[ -\left[ \frac{\partial \hat{k}(z)^{-1}}{\partial \tau'} \right] y_t \]

To this end, we will construct a regression of a \(T\cdot m\) dimensional stacked vector of \(\left[ \hat{k}(z) - \frac{\partial \hat{k}(z)}{\partial \tau'} \hat{\tau} \right] y_t\) on the corresponding \(\left( T\cdot m \times m^2(p+q) \right)\)-dimensional matrix of stacked \(-\left[ \frac{\partial \hat{k}(z)}{\partial \tau'} \right] y_t\).

Toeplitz Regression: RHS

Derivative with respect to AR parameters

Considering \[ k(z)^{-1} y_t = u_t = y_t + \left(a_1, \ldots, a_p \right) \begin{pmatrix} y_{t-1}\\ \vdots\\ y_{t-p} \end{pmatrix} - \left( b_1, \ldots, b_q \right) \begin{pmatrix} u_{t-1}\\ \vdots\\ u_{t-q} \end{pmatrix}, \] vectorization using \(vec(ABC) = (C' \otimes A) vec(B)\) gives \[ u_t = y_t + (x_{t-1}' \otimes I_m) \tau_1 - (w_{t-1}' \otimes I_m) \tau_2 \] where \(x_{t-1}' = (y_{t-1}',\ldots, y_{t-p}')\), \(\tau_1 = vec\left(a_1, \ldots, a_p \right)\), \(w_{t-1}' = (u_{t-1}',\ldots, u_{t-q}')\), \(\tau_2 = vec\left(b_1, \ldots, b_q \right)\).

The partial derivative is \[ \frac{\partial k(z)^{-1} y_t }{\partial \tau_1'} = (x_{t-1}' \otimes I_m) - \left( b_1, \ldots, b_q \right) \begin{pmatrix} \frac{\partial u_{t-1}}{\partial \tau_1'}\\ \vdots\\ \frac{\partial u_{t-q}}{\partial \tau_1'} \end{pmatrix} \] which is equivalent to \[ b(z) \frac{\partial u_{t}}{\partial \tau_1'} = (x_{t-1}' \otimes I_m). \]

Derivative with respect to MA parameters

Similarly, we obtain \[ \frac{\partial k(z)^{-1} y_t }{\partial \tau_2'} = - (w_{t-1}' \otimes I_m) - \left( b_1, \ldots, b_q \right) \begin{pmatrix} \frac{\partial u_{t-1}}{\partial \tau_2'}\\ \vdots\\ \frac{\partial u_{t-q}}{\partial \tau_2'} \end{pmatrix} \] which is equivalent to \[ b(z) \frac{\partial u_{t}}{\partial \tau_2'} = - (w_{t-1}' \otimes I_m). \]

Summary

We have thus \[ \frac{\partial k(z)^{-1} y_t }{\partial (\tau_1', \tau_2') } = b(z)^{-1} \left( (x_{t-1}', -w_{t-1}') \otimes I_m \right). \] of dimension \((m \times m^2(p+q))\). These matrices are eventually stacked such that the RHS of the regression is of dimension \((Tm \times m^2(p+q))\).

The first few elements (pertaining to the AR parameters) of the matrix above are \[ b(z)^{-1} \left( I_m y_{1,t-1}, \ldots, I_m y_{m,t-1} | \cdots | I_m y_{1,t-p}, \ldots, I_m y_{m,t-p} \right) . \]

Toeplitz Regression: LHS

Calculating \(\frac{\partial \hat{k}(z)}{\partial \tau'} \hat{\tau}\), where \(\tau' = (\tau_1', \tau_2')\), is slightly more difficult because we first need to vectorize and then devectorize. In the univariate case, \(\frac{\partial \hat{k}(z)}{\partial \tau'} \hat{\tau} = \hat{b}(z)^{-1} \hat{k}(z)^{-1} - \hat{b}(z)^{-1}\). We will see that, despite non-commutativity of matrix multiplication, this expression is the same in the multivariate case.

AR Parameters

\[ \begin{aligned} \frac{\partial vec(k(z)^{-1} )}{\partial \tau_1' } & = \frac{\partial}{\partial \tau_1' } \left[ vec \left( b(z)^{-1} \left( I_m + \left(a_1, \ldots, a_p \right) \begin{pmatrix} z I_m\\ \vdots\\ z^p I_m \end{pmatrix} \right) \right) \right] \\ & = \left( I_m \otimes b(z)^{-1}\right) \frac{\partial}{\partial \tau_1' } \left[ vec \left( I_m + \left(a_1, \ldots, a_p \right) \begin{pmatrix} z I_m\\ \vdots\\ z^p I_m \end{pmatrix} \right) \right] \\ & = \left( I_m \otimes b(z)^{-1}\right) \left( \left( z I_m, \ldots, z^p I_m \right) \otimes I_m \right) \\ & = \left( \left( z I_m, \ldots, z^p I_m \right) \otimes b(z)^{-1}\right) \end{aligned} \]

Thus, \[ \begin{aligned} \frac{\partial vec(k(z)^{-1})}{\partial \tau_1' } \tau_1 & = \left( \left( z I_m, \ldots, z^p I_m \right) \otimes b(z)^{-1}\right) vec\left(a_1, \ldots, a_p\right) \end{aligned} \] and devectorized \[ \begin{aligned} \frac{\partial k(z)^{-1} }{\partial \tau_1' } \tau_1 & = vec^{-1} \left( vec \left[ b(z)^{-1} \left(a_1, \ldots, a_p\right) \begin{pmatrix} z I_m\\ \vdots\\ z^p I_m \end{pmatrix} \right] \right) \\ & = b(z)^{-1} \left( a_1 z + \cdots + a_p z^p + I_m - I_m \right) \\ & = b(z)^{-1} a(z) - b(z)^{-1} \end{aligned} \]

MA Parameters

Similarly,

\[ \begin{aligned} \frac{\partial vec(k(z)^{-1} )}{\partial \tau_2' } & = \frac{\partial}{\partial \tau_2' } \left[ vec \left( \left( I_m + \left(b_1, \ldots, b_q \right) \begin{pmatrix} z I_m\\ \vdots\\ z^q I_m \end{pmatrix} \right)^{-1} a(z)\right) \right] \\ & = -\left( a'(z) \otimes I_m \right) \left( b'(z)^{-1} \otimes b(z)^{-1} \right) \frac{\partial}{\partial \tau_1' } \left[ vec \left( I_m + \left(b_1, \ldots, b_q \right) \begin{pmatrix} z I_m\\ \vdots\\ z^q I_m \end{pmatrix} \right) \right] \\ & = -\left( a'(z) b'(z)^{-1} \otimes b(z)^{-1} \right) \left( \left( z I_m, \ldots, z^q I_m \right) \otimes I_m \right) \\ & = -\left( a'(z) b'(z)^{-1} \left( z I_m, \ldots, z^q I_m \right) \otimes b(z)^{-1} \right) \end{aligned} \]

Thus, \[ \begin{aligned} \frac{\partial vec(k(z)^{-1})}{\partial \tau_2' } \tau_2 & = -\left( a'(z) b'(z)^{-1} \left( z I_m, \ldots, z^q I_m \right) \otimes b(z)^{-1} \right) vec\left(b_1, \ldots, b_q \right) \end{aligned} \] and devectorized \[ \begin{aligned} \frac{\partial k(z)^{-1} }{\partial \tau_2' } \tau_2 & = -vec^{-1} \left( vec \left[ b(z)^{-1} \left(b_1, \ldots, b_q\right) \begin{pmatrix} z I_m\\ \vdots\\ z^p I_m \end{pmatrix} b(z)^{-1} a(z) \right] \right) \\ & = -b(z)^{-1} \left( b_1 z + \cdots + b_q z^q + I_m - I_m \right) k(z)^{-1} \\ & = b(z)^{-1} k(z)^{-1} - k(z)^{-1} \end{aligned} \]

Summary

We finally obtain that

\[ \begin{aligned} \frac{\partial k(z)^{-1} }{\partial (\tau_1', \tau_2') } \tau &= b(z)^{-1} a(z) - b(z)^{-1} + b(z)^{-1} k(z)^{-1} - k(z)^{-1}\\ &= - b(z)^{-1} + b(z)^{-1} k(z)^{-1} \end{aligned} \]

and thus for the LHS

\[ \begin{aligned} \left[ \hat{k}(z)^{-1} - \frac{\partial \hat{k}(z)}{\partial \tau'} \hat{\tau} \right] y_t &= \left[ \hat{k}(z)^{-1} - \hat{b}(z)^{-1} \hat{k}(z)^{-1} + \hat{b}(z)^{-1} \right] y_t\\ &= \varepsilon^{II}_t - \hat{b}(z)^{-1} \varepsilon^{II}_t + \hat{b}(z)^{-1} y_t\\ \end{aligned} \]

Summary

The regression of \[ \left[ \hat{k}(z)^{-1} - \frac{\partial \hat{k}(z)^{-1}}{\partial \tau'} \hat{\tau} \right] y_t \] on \[ -\left[ \frac{\partial \hat{k}(z)^{-1}}{\partial \tau'} \right] y_t \]

is thus \[ \varepsilon^{II}_t - \hat{b}(z)^{-1} \varepsilon^{II}_t + \hat{b}(z)^{-1} y_t = \hat{b}(z)^{-1} \left( (-x_{t-1}', w_{t-1}') \otimes I_m \right) \cdot vec\left(a_1, \ldots, a_p, b_1, \ldots, b_q \right) + error. \]

Remarks on Root Flipping

Before applying Blaschke matrices, it is necessary to “shift” a square root \(\Sigma_L\) (in our case the lower-triangular Choleski factor) of the covariance matrix \(\Sigma\) in the spectral factorization

\[ f(z) = k(z) \Sigma k'\left(\frac{1}{z}\right) \] into the transfer function.

Right-multiplying a Blaschke factor on \(k(z) \Sigma_L\) does not change the spectral density while this cannot be guaranteed when right-multiplying (in our implementation) on \(k(z)\).

Note that right-multiplying a Blaschke factor which flips a root from inside to outside the unit circle increases the “innovation” covariance, i.e. \(k(0) \Sigma_L b(0) \geq k(0) \Sigma_L\) in the usual partial order for positive semi-definite matrices (https://en.wikipedia.org/wiki/Loewner_order).

Hannan-Rissanen-Kavalieris Estimation - Stage 3

Bernd Funovits

2023-12-17

Differences to HRK in HD

Stage 3

Inputs and Outputs

Optimization Criterion

Toeplitz Regression: RHS

Derivative with respect to AR parameters

Derivative with respect to MA parameters

Summary

Toeplitz Regression: LHS

AR Parameters

MA Parameters

Summary

Summary

Remarks on Root Flipping