Hannan-Rissanen-Kavalieris Estimation - Stage 3
Bernd Funovits
2023-12-17
c_hrk_stage3.Rmd
Differences to HRK in HD
The Hannan-Rissanen-Kavalieris algorithm implemented in this package differs in some ways from the one described in Hannan, Deistler 2012: The Statistical Theory of Linear Systems (henceforth HD) in Chapter 6.7, starting on page 294ff:
- HRK in HD uses Toeplitz and recursive calculations, i.e. at every stage, models for all orders (up to an upper bound) \(p,q\) are calculated. We use OLS for estimating one particular model with AR and MA orders \(p\) and \(q\).
- HRK in HD is in Kronecker echelon form and only treats the generic neighborhood (Kronecker indices are such that \(n_1 = \cdots = n_k = n_{k+1} + 1 = \cdots = n_m + 1)\), where \(m\) is the output dimension. We allow for affine restrictions which are more general. However, it is still required that the \(a_0 = b_0\), \(diag(a_0)=I_m\), and that \(\Sigma_L\) is lower-triangular.
Stage 3
Inputs and Outputs
Given are
- the data \(y\) containing \(m\) variables, \(T\) number of observations, and stored in a matrix of dimension \((T \times m)\)
- the residuals \(\varepsilon_t^{II}\) obtained from stage 2, i.e. and obtained from \(b_{II,f}(z)^{-1} a_{II}(z) y_t\) where all the zeros of \(b_{II,f}(z)\) have been flipped outside the unit circle. Note that \(a(z)\) might still have zeros inside the unit circle!
- The estimates \(a_{II}(z)\) and \(b_{II,f}(z)\) from above.
The outputs of the algorithm are
-
\(a_{III}(z)\), \(b_{III,f}(z)\) (obtained from \(b_{III}(z)\) by potentially flipping zeros
from inside to outside the unit circle).
- \(a_{III}(z)\) and \(b_{III}(z)\) are obtained from an OLS regression. As side-product, we also obtain the “ARX” residuals \(u^{III}_t\)
- Residuals \(\varepsilon_t^{III}\). These are obtained from \(b_{III,f}(z)^{-1} a_{III}(z) y_t\). Again, it is not guaranteed that \(a_{III}(z)\) has no zeros inside the unit circle!
Optimization Criterion
The goal is to minimize
\[ \int_{-\pi}^{\pi} trace\left[ \left( k(e^{i\omega}) \Sigma k'(e^{-i\omega}) \right)^{-1} \left( w(\omega) w^*(\omega)\right)\right] d\omega \] in the respective parameter space, where \[ w(\omega) = \left( \frac{1}{\sqrt{T}} \sum_{t=1}^T y_t e^{it\omega_j} \right) \] Similar to the univariate case, we will first order approximate the transfer function in \[ \left( k(e^{i\omega}) \Sigma k'(e^{-i\omega}) \right)^{-1} \] around the initial value obtained from stage 2, i.e. \[ \left[ k(e^{i\omega})^{-*} \Sigma^{-\frac{1}{2}} \right] \Sigma^{-\frac{1}{2}} \left( \hat{k}(e^{i\omega})^{-1} + \frac{\partial \hat{k}(e^{i\omega})^{-1}}{\partial \tau'} \left( \tau - \hat{\tau}\right) \right) \] In the following, we will derive the regression obtained from the terms in the parentheses in the parametrisation where \(\tau = vec\left(a_1, \ldots, a_p, b_1, \ldots, b_q \right)\). The term \(\frac{\partial \hat{k}(e^{-i\omega})}{\partial \tau'}\) means the partial derivative w.r.t. \(\tau\) evaluated at \(\hat{\tau}\). (We restrict ourselves to the VARMA(p,q) model but, of course, it is applicable given the affine restrictions described above.)
We need to regress \[ \left[ \hat{k}(z)^{-1} - \frac{\partial \hat{k}(z)^{-1}}{\partial \tau'} \hat{\tau} \right] y_t \] on \[ -\left[ \frac{\partial \hat{k}(z)^{-1}}{\partial \tau'} \right] y_t \]
To this end, we will construct a regression of a \(T\cdot m\) dimensional stacked vector of \(\left[ \hat{k}(z) - \frac{\partial \hat{k}(z)}{\partial \tau'} \hat{\tau} \right] y_t\) on the corresponding \(\left( T\cdot m \times m^2(p+q) \right)\)-dimensional matrix of stacked \(-\left[ \frac{\partial \hat{k}(z)}{\partial \tau'} \right] y_t\).
Toeplitz Regression: RHS
Derivative with respect to AR parameters
Considering \[ k(z)^{-1} y_t = u_t = y_t + \left(a_1, \ldots, a_p \right) \begin{pmatrix} y_{t-1}\\ \vdots\\ y_{t-p} \end{pmatrix} - \left( b_1, \ldots, b_q \right) \begin{pmatrix} u_{t-1}\\ \vdots\\ u_{t-q} \end{pmatrix}, \] vectorization using \(vec(ABC) = (C' \otimes A) vec(B)\) gives \[ u_t = y_t + (x_{t-1}' \otimes I_m) \tau_1 - (w_{t-1}' \otimes I_m) \tau_2 \] where \(x_{t-1}' = (y_{t-1}',\ldots, y_{t-p}')\), \(\tau_1 = vec\left(a_1, \ldots, a_p \right)\), \(w_{t-1}' = (u_{t-1}',\ldots, u_{t-q}')\), \(\tau_2 = vec\left(b_1, \ldots, b_q \right)\).
The partial derivative is \[ \frac{\partial k(z)^{-1} y_t }{\partial \tau_1'} = (x_{t-1}' \otimes I_m) - \left( b_1, \ldots, b_q \right) \begin{pmatrix} \frac{\partial u_{t-1}}{\partial \tau_1'}\\ \vdots\\ \frac{\partial u_{t-q}}{\partial \tau_1'} \end{pmatrix} \] which is equivalent to \[ b(z) \frac{\partial u_{t}}{\partial \tau_1'} = (x_{t-1}' \otimes I_m). \]
Derivative with respect to MA parameters
Similarly, we obtain \[ \frac{\partial k(z)^{-1} y_t }{\partial \tau_2'} = - (w_{t-1}' \otimes I_m) - \left( b_1, \ldots, b_q \right) \begin{pmatrix} \frac{\partial u_{t-1}}{\partial \tau_2'}\\ \vdots\\ \frac{\partial u_{t-q}}{\partial \tau_2'} \end{pmatrix} \] which is equivalent to \[ b(z) \frac{\partial u_{t}}{\partial \tau_2'} = - (w_{t-1}' \otimes I_m). \]
Summary
We have thus \[ \frac{\partial k(z)^{-1} y_t }{\partial (\tau_1', \tau_2') } = b(z)^{-1} \left( (x_{t-1}', -w_{t-1}') \otimes I_m \right). \] of dimension \((m \times m^2(p+q))\). These matrices are eventually stacked such that the RHS of the regression is of dimension \((Tm \times m^2(p+q))\).
The first few elements (pertaining to the AR parameters) of the matrix above are \[ b(z)^{-1} \left( I_m y_{1,t-1}, \ldots, I_m y_{m,t-1} | \cdots | I_m y_{1,t-p}, \ldots, I_m y_{m,t-p} \right) . \]
Toeplitz Regression: LHS
Calculating \(\frac{\partial \hat{k}(z)}{\partial \tau'} \hat{\tau}\), where \(\tau' = (\tau_1', \tau_2')\), is slightly more difficult because we first need to vectorize and then devectorize. In the univariate case, \(\frac{\partial \hat{k}(z)}{\partial \tau'} \hat{\tau} = \hat{b}(z)^{-1} \hat{k}(z)^{-1} - \hat{b}(z)^{-1}\). We will see that, despite non-commutativity of matrix multiplication, this expression is the same in the multivariate case.
AR Parameters
\[ \begin{aligned} \frac{\partial vec(k(z)^{-1} )}{\partial \tau_1' } & = \frac{\partial}{\partial \tau_1' } \left[ vec \left( b(z)^{-1} \left( I_m + \left(a_1, \ldots, a_p \right) \begin{pmatrix} z I_m\\ \vdots\\ z^p I_m \end{pmatrix} \right) \right) \right] \\ & = \left( I_m \otimes b(z)^{-1}\right) \frac{\partial}{\partial \tau_1' } \left[ vec \left( I_m + \left(a_1, \ldots, a_p \right) \begin{pmatrix} z I_m\\ \vdots\\ z^p I_m \end{pmatrix} \right) \right] \\ & = \left( I_m \otimes b(z)^{-1}\right) \left( \left( z I_m, \ldots, z^p I_m \right) \otimes I_m \right) \\ & = \left( \left( z I_m, \ldots, z^p I_m \right) \otimes b(z)^{-1}\right) \end{aligned} \]
Thus, \[ \begin{aligned} \frac{\partial vec(k(z)^{-1})}{\partial \tau_1' } \tau_1 & = \left( \left( z I_m, \ldots, z^p I_m \right) \otimes b(z)^{-1}\right) vec\left(a_1, \ldots, a_p\right) \end{aligned} \] and devectorized \[ \begin{aligned} \frac{\partial k(z)^{-1} }{\partial \tau_1' } \tau_1 & = vec^{-1} \left( vec \left[ b(z)^{-1} \left(a_1, \ldots, a_p\right) \begin{pmatrix} z I_m\\ \vdots\\ z^p I_m \end{pmatrix} \right] \right) \\ & = b(z)^{-1} \left( a_1 z + \cdots + a_p z^p + I_m - I_m \right) \\ & = b(z)^{-1} a(z) - b(z)^{-1} \end{aligned} \]
MA Parameters
Similarly,
\[ \begin{aligned} \frac{\partial vec(k(z)^{-1} )}{\partial \tau_2' } & = \frac{\partial}{\partial \tau_2' } \left[ vec \left( \left( I_m + \left(b_1, \ldots, b_q \right) \begin{pmatrix} z I_m\\ \vdots\\ z^q I_m \end{pmatrix} \right)^{-1} a(z)\right) \right] \\ & = -\left( a'(z) \otimes I_m \right) \left( b'(z)^{-1} \otimes b(z)^{-1} \right) \frac{\partial}{\partial \tau_1' } \left[ vec \left( I_m + \left(b_1, \ldots, b_q \right) \begin{pmatrix} z I_m\\ \vdots\\ z^q I_m \end{pmatrix} \right) \right] \\ & = -\left( a'(z) b'(z)^{-1} \otimes b(z)^{-1} \right) \left( \left( z I_m, \ldots, z^q I_m \right) \otimes I_m \right) \\ & = -\left( a'(z) b'(z)^{-1} \left( z I_m, \ldots, z^q I_m \right) \otimes b(z)^{-1} \right) \end{aligned} \]
Thus, \[ \begin{aligned} \frac{\partial vec(k(z)^{-1})}{\partial \tau_2' } \tau_2 & = -\left( a'(z) b'(z)^{-1} \left( z I_m, \ldots, z^q I_m \right) \otimes b(z)^{-1} \right) vec\left(b_1, \ldots, b_q \right) \end{aligned} \] and devectorized \[ \begin{aligned} \frac{\partial k(z)^{-1} }{\partial \tau_2' } \tau_2 & = -vec^{-1} \left( vec \left[ b(z)^{-1} \left(b_1, \ldots, b_q\right) \begin{pmatrix} z I_m\\ \vdots\\ z^p I_m \end{pmatrix} b(z)^{-1} a(z) \right] \right) \\ & = -b(z)^{-1} \left( b_1 z + \cdots + b_q z^q + I_m - I_m \right) k(z)^{-1} \\ & = b(z)^{-1} k(z)^{-1} - k(z)^{-1} \end{aligned} \]
Summary
We finally obtain that
\[ \begin{aligned} \frac{\partial k(z)^{-1} }{\partial (\tau_1', \tau_2') } \tau &= b(z)^{-1} a(z) - b(z)^{-1} + b(z)^{-1} k(z)^{-1} - k(z)^{-1}\\ &= - b(z)^{-1} + b(z)^{-1} k(z)^{-1} \end{aligned} \]
and thus for the LHS
\[ \begin{aligned} \left[ \hat{k}(z)^{-1} - \frac{\partial \hat{k}(z)}{\partial \tau'} \hat{\tau} \right] y_t &= \left[ \hat{k}(z)^{-1} - \hat{b}(z)^{-1} \hat{k}(z)^{-1} + \hat{b}(z)^{-1} \right] y_t\\ &= \varepsilon^{II}_t - \hat{b}(z)^{-1} \varepsilon^{II}_t + \hat{b}(z)^{-1} y_t\\ \end{aligned} \]
Summary
The regression of \[ \left[ \hat{k}(z)^{-1} - \frac{\partial \hat{k}(z)^{-1}}{\partial \tau'} \hat{\tau} \right] y_t \] on \[ -\left[ \frac{\partial \hat{k}(z)^{-1}}{\partial \tau'} \right] y_t \]
is thus \[ \varepsilon^{II}_t - \hat{b}(z)^{-1} \varepsilon^{II}_t + \hat{b}(z)^{-1} y_t = \hat{b}(z)^{-1} \left( (-x_{t-1}', w_{t-1}') \otimes I_m \right) \cdot vec\left(a_1, \ldots, a_p, b_1, \ldots, b_q \right) + error. \]
Remarks on Root Flipping
Before applying Blaschke matrices, it is necessary to “shift” a square root \(\Sigma_L\) (in our case the lower-triangular Choleski factor) of the covariance matrix \(\Sigma\) in the spectral factorization
\[ f(z) = k(z) \Sigma k'\left(\frac{1}{z}\right) \] into the transfer function.
Right-multiplying a Blaschke factor on \(k(z) \Sigma_L\) does not change the spectral density while this cannot be guaranteed when right-multiplying (in our implementation) on \(k(z)\).
Note that right-multiplying a Blaschke factor which flips a root from inside to outside the unit circle increases the “innovation” covariance, i.e. \(k(0) \Sigma_L b(0) \geq k(0) \Sigma_L\) in the usual partial order for positive semi-definite matrices (https://en.wikipedia.org/wiki/Loewner_order).