Skip to contents

Estimate AR Models

Let \((y_t)\) be the \(m\)-dimensional AR(\(p\)) process defined by the stable AR(\(p\)) system

\[ y_t = a_1 y_{t-1} + \cdots + a_p y_{t-p} + u_t \] \((u_t)\) is a white noise process with covariance matrix \(\Sigma=\mathbb{E} u_t u_t' \in \mathbb{R}^{m \times m}\).

Yule-Walker Estimate

If the above AR(\(p\)) model is stable (i.e. \(\det(I_m - a_1 z - \cdots - a_p z^p)\) is non zero for all \(z\in \mathbb{C}\) with \(|z|\leq 1\)) then the autocovariance function \((\gamma_k = \mathbb{E} y_{t+k} y_t')\) and the parameters of the model satisfy the socalled Yule-Walker equations:

\[ \begin{aligned} \gamma_0 &= a_1 \gamma_{-1} + \cdots + a_p \gamma_{-p} + \Sigma \\ \gamma_k &= a_1 \gamma_{k-1} + \cdots + a_p \gamma_{k-p} & \mbox{ for } k > 0 \end{aligned} \]

If we use the notation \[ a = (a_p,\ldots,a_1), \; y_{t-1}^{p} = (y_{t-p}',\ldots,y_{t-1}'), \; \Gamma_p = \mathbb{E} y_{t-1}^{p} (y_{t-1}^{p})' \mbox{ and } \; \gamma_1^p = \mathbb{E} y_t (y_{t-1}^p)' = (\gamma_p,\ldots,\gamma_1) \] then the Yule-Walker equations (for \(k=0,\ldots,p\)) may be written as \[ \begin{aligned} \gamma_0 &= a (\gamma_1^p)' + \Sigma \\ \gamma_1^p &= a \Gamma_p \end{aligned} \]

To solve these equations we use the Cholesky decomposition of the Toeplitz matrix

\[ \Gamma_{p+1} = \mathbb{E} \begin{pmatrix} y_{t-1}^p \\ y_t \end{pmatrix} \begin{pmatrix} y_{t-1}^p \\ y_t \end{pmatrix}' = \begin{pmatrix} \Gamma_p & (\gamma_1^p)' \\ \gamma_1^p & \gamma_{0} \end{pmatrix} = \begin{pmatrix} R_{11}' & 0_{mp\times m} \\ R_{12}' & R_{22}' \end{pmatrix} \begin{pmatrix} R_{11} & R_{12} \\ 0_{m\times mp} & R_{22} \end{pmatrix} \]

\[ \begin{aligned} a &= \gamma_1^p \Gamma_p^{-1} = R_{12}' R_{11} (R_{11}'R_{11})^{-1} = R_{12}' R_{11}^{-T} \\ \Sigma &= \gamma_0 - a (\gamma_1^p)' = R_{12}' R_{12} + R_{22}' R_{22} - R_{12}' R_{11}^{-T} R_{11}' R_{12} = R_{22}'R_{22} \end{aligned} \]

One advantage of this approach is that we can easily “read off” the noise covariances for all AR models with orders \(0,1,\ldots,p\) from the cholesky factor \(R\). Therefore the determinant of these noise covariances may be computed from the diagonal elements of \(R\). E.g. for the model with order \(p\) we have \[ \log\det\Sigma_p = 2 \sum_{i=mp+1}^{m(p+1)} \log r_{ii} \] where \(r_{ii}\) denotes the \(i\)-th diagonal element of \(R\).

This algorithm is implemented in the function

est_ar_yw(gamma, p.max = (dim(gamma)[3]-1), penalty = -1)

Durbin-Levinson-Whittle Algorithm

Let \((y_{t} \,|\, t \in \mathbb{Z})\) denote a discrete time, stationary process with mean zero and autocovariance function \(\gamma_k = \mathbb{E} y_{t+k} y_t'\). The Durbin-Levinson-Whittle procedure computes the linear, least squares, one-step ahead predictions recursively in the number of past values used for the prediction. See (Whittle 1963).

The one-step ahead prediction for \(y_t\) given \(p\) past values \[ y_{t|p} = a^{(p)}_1 y_{t-1} + \cdots + a^{(p)}_p y_{t-p} \] is the orthogonal projection of \(y_t\) onto the space

\[ \mathbb{H}_{t-p}^{t-1} = \mathrm{span}\{y_{t-1}, \ldots , y_{t-p}\}. \] If we denote the corresponding projection operator with \(\mathbb{P}_{t-p}^{t-1}\) then we can write \(y_{t|p}= \mathbb{P}_{t-p}^{t-1} \, y_t\). The forecast error and its covariance matrix are denoted with

\[ \begin{aligned} u_{t|p} & = y_t - y_{t|p} = y_t - a^{(p)}_1 y_{t-1} - \cdots - a^{(p)}_p y_{t-p} \\ \Sigma_p & = \mathbb{E} u_{t|p} u_{t|p}' = \gamma_0 - \mathbb{E} y_{t|p} y_{t|p}' \end{aligned} \]

The main trick, to derive the Durbin-Levinson-Whittle recursions, is to consider also backcasts, i.e. the linear, least squares, approximations of \(y_s\) in terms of future values of the process. We define

\[ \begin{aligned} y^{b}_{s|p} & = \mathbb{P}_{s+1}^{s+p} y_s = b^{(p)}_1 y_{s+1} + \cdots + b^{(p)}_p y_{s+p} \\ v_{s|p} & = y_s - y^{b}_{s|p} = y_s - b^{(p)}_1 y_{s+1} - \cdots - b^{(p)}_p y_{s+p} \\ \Omega_p & = \mathbb{E} v_{s|p} v_{s|p}' = \gamma_0 - \mathbb{E} y^{b}_{s|p} (y^{b}_{s|p})' \end{aligned} \]

The Durbin-Levinson-Whittle recursions consist of the following equations:

\[ \begin{aligned} \Delta_{p}' = \mathbb{E} v_{t-p|t-1} y_t ' = \mathbb{E} y_{t-p} u_{t|t-1}' & = \gamma_{-p} - b_{1}^{(p-1)} \gamma_{1-p} - \cdots - b_{p-1}^{(p-1)} \gamma_{-1} \\ a_p^{(p)} &= \Delta_p \Omega_{p-1}^{-1} \\ \left( a_{1}^{(p)}, \ldots, a_{p-1}^{(p)} \right) &= \left(a_{1}^{(p-1)}, \ldots, a_{p-1}^{(p-1)} \right) - a_p^{(p)} \left(b_{p-1}^{(p-1)}, \ldots, b_{1}^{(p-1)} \right) \\ b_p^{(p)} &= \Delta_p' \Sigma_{p-1}^{-1} \\ \left( b_{1}^{(p)}, \ldots, b_{p-1}^{(p)} \right) &= \left( b_{1}^{(p-1)}, \ldots, b_{p-1}^{(p-1)} \right) - b_p^{(p)} \left(a_{p-1}^{(p-1)}, \ldots, a_{1}^{(p-1)} \right) \\ \Sigma_{p} & = \Sigma_{p-1} - \Delta_p \Omega_{p-1}^{-1} \Delta_p' = \Sigma_{p-1} - a^{(p)}_p \Delta_p' = \left( I_{n} - a_{p}^{(p)} b_{p}^{(p)} \right) \Sigma_{p-1}\\ \Omega_{p} & = \Omega_{p-1} - \Delta_p' \Sigma_{p-1}^{-1} \Delta_p = \Omega_{p-1} - b^{(p)}_p \Delta_p = \left( I_{n} - b_{p}^{(p)} a_{p}^{(p)} \right) \Omega_{p-1} \end{aligned} \]

The recursions start with \(p=1\) and initial values \(\Sigma_0 = \Omega_0 = \gamma_0\).

Proof

Express the forecast \(y_{t|p}\) as the sum of two projections onto orthogonal spaces

The prediction \(y_{t|p}\) is the projection of \(y_{t}\) onto the space \(\mathbb{H}_{t-p}^{t-1} = \mathrm{span}\{y_{t-p},\ldots, y_{t-1}\}\). This space can be decomposed into the sum of the two orthogonal subspaces

  • \(\mathbb{H}_{t-p+1}^{t-1} = \mathrm{span}\{y_{t-p+1},\ldots, y_{t-1}\}\) and
  • the span of the components of the backcast error \(v_{t-p|p-1} = y_{t-p} - \mathbb{P}_{t-p+1}^{t-1} y_{t-p} = y_{t-p} - b^{(p-1)}_1 y_{t-p+1} - \cdots - b^{(p-1)}_{p-1} y_{t-1}.\)

Therefore \(y_{t|p}\) is equal to the sum of the projection of \(y_t\) onto \(\mathbb{H}_{t-p+1}^{t-1}\) (i.e. \(y_{t|p-1}\)) and the projection of \(y_t\) onto the space spanned by \(v_{t-p|p-1}\). The second projection, which represents the contribution of \(y_{t-p}\) to the prediction, can be expressed as \[ \begin{aligned} (\mathbb{E} y_{t} v_{t-p|p-1}')(\mathbb{E} v_{t-p|p-1} v_{t-p|p-1}')^{-1} v_{t-p|p-1} = \\ (\mathbb{E} (y_{t-p} - b^{(p-1)}_1 y_{t-p+1} - \cdots - b^{(p-1)}_{p-1} y_{t-1}) y_t' )' \Omega_{p-1}^{-1} (y_{t-p} - b^{(p-1)}_1 y_{t-p+1} - \cdots - b^{(p-1)}_{p-1} y_{t-1}) = \\ (\gamma_{-p} - b^{(p-1)}_1 \gamma_{1-p} - \cdots - b^{(p-1)}_{p-1} \gamma_{-1})' \Omega_{p-1}^{-1} (y_{t-p} - b^{(p-1)}_1 y_{t-p+1} - \cdots - b^{(p-1)}_{p-1} y_{t-1}) \end{aligned} \] Collecting the two projections, we obtain

\[ \begin{aligned} y_{t|p} = \mathbb{P}_{t-p}^{t-1} y_t &= a_{1}^{(p-1)}y_{t-1}+\cdots+a_{p-1}^{(p-1)}y_{t-p+1} + \Delta_p \Omega_{p-1}^{-1} (y_{t-p} - b^{(p-1)}_1 y_{t-p+1} - \cdots - b^{(p-1)}_{p-1} y_{t-1})\\ &= \left(\left(a_{1}^{(p-1)},\ldots,a_{p-1}^{(p-1)},0\right) + \Delta_p \Omega_{p-1}^{-1} \left(-b_{p-1}^{(p-1)},\ldots,-b^{(p-1)}_1,I_{n}\right)\right) \begin{pmatrix}y_{t-1}\\ \vdots\\ y_{t-p+1}\\ y_{t-p} \end{pmatrix} \end{aligned} \] where \[ \Delta_p = (\mathbb{E} v_{t-p|p-1} y_t')' = (\gamma_{-p} - b^{(p-1)}_1 \gamma_{1-p} - \cdots - b^{(p-1)}_{p-1} \gamma_{-1})'. \]

Express the backcast \(y^b_{t-p|p}\) as the sum of two projections onto orthogonal spaces

We first decompose the space \(\mathbb{H}_{t-p+1}^{t}\) into the orthogonal sum of the spaces \(\mathbb{H}_{t-p+1}^{t-1} = \mathrm{span}\{y_{t-p+1},\ldots,y_{t-1}\}\) and the span of the components of forecast error \(u_{t|p-1} = y_t - a^{(p-1)}_1 y_{t-1} - \cdots - a^{(p-1)}_{p-1} y_{t-p+1}\).

The projection of \(y_{t-p}\) onto the space spanned by the forecast error \(u_{t|p-1}\) is given by \[ \begin{aligned} (\mathbb{E} y_{t-p} u_{t|p-1}')(\mathbb{E} u_{t|p-1} u_{t|p-1}')^{-1} u_{t|p-1} = \\ (\mathbb{E} (y_t - a^{(p-1)}_1 y_{t-1} - \cdots - a^{(p-1)}_{p-1} y_{t-p+1}) y_{t-p}')' \Sigma_{p-1}^{-1} (y_t - a^{(p-1)}_1 y_{t-1} - \cdots - a^{(p-1)}_{p-1} y_{t-p+1}) = \\ (\gamma_{p} - a^{(p-1)}_1 \gamma_{p-1} - \cdots - a^{(p-1)}_{p-1} \gamma_{1})' \Sigma_{p-1}^{-1} (y_t - a^{(p-1)}_1 y_{t-1} - \cdots - a^{(p-1)}_{p-1} y_{t-p+1}) \end{aligned} \] Therefore the backcast \(y^{b}_{t-p|p}\) is equal to \[ \begin{aligned} y^{b}_{t-p|p} = \mathbb{P}_{t-p+1}^{t} y_{t-p} &= b_{1}^{(p-1)}y_{t-p+1}+\cdots+b_{p-1}^{(p-1)}y_{t-1} + \Delta^b_{p} \Sigma_{p-1}^{-1} (y_{t} - a^{(p-1)}_1 y_{t-1} - \cdots - b^{(p-1)}_{p-1} y_{t-p+1})\\ &= \left(\left(b_{1}^{(p-1)},\ldots,b_{p-1}^{(p-1)},0\right) + \Delta^b_{p} \Sigma_{p-1}^{-1} \left(-a_{p-1}^{(p-1)},\ldots,-a^{(p-1)}_1,I_{n}\right)\right) \begin{pmatrix}y_{t-p+1}\\ \vdots\\ y_{t-1}\\ y_{t} \end{pmatrix} \end{aligned} \] where \[ \Delta^b_p = (\mathbb{E} u_{t|p-1} y_{t-p}')' = \left( \gamma_{p} - a^{(p-1)}_1 \gamma_{p-1} - \cdots - a^{(p-1)}_{p-1} \gamma_{1} \right)'. \]

Covariances of the forecast error and backcast errors

The covariance matrix of the forecast error \(u_{t|p}\) is (due to the orthogonality) \[ \Sigma_p = \mathbb{E} (u_{t|p} u_{t|p}') = \mathbb{E} (u_{t|p-1} u_{t|p-1}') - \mathbb{E} ((\Delta_p \Omega_{p-1}^{-1} v_{t-p|p-1}) (\Delta_p \Omega_{p-1}^{-1} v_{t-p|p-1})') = \Sigma_{p-1} - \Delta_p \Omega_{p-1}^{-1} \Delta_p' \] and analogously the covariance matrix of the backcast error is given by \[ \Omega_p = \mathbb{E} (\bar u_{t-p|p} \bar u_{t-p|p}') = \mathbb{E} (\bar u_{t-p|p-1} \bar u_{t-p|p-1}') - \mathbb{E} ((\Delta^b_p \Sigma_{p-1}^{-1} u_{t|p-1}) (\Delta^b_p \Sigma_{p-1}^{-1} u_{t|p-1})') = \Omega_{p-1} - \Delta^b_p \Sigma_{p-1}^{-1} (\Delta^b_p)'. \]

Note that \[ \Delta_p = \mathbb{E} y_t v_{t-p|p-1}' = \mathbb{E} u_{t|p-1} v_{t-p|p-1}' = \mathbb{E} u_{t|p-1} y_{t-p}' = (\Delta^b_p)'. \]

Together with \(a^{(p)}_p = \Delta_p \Omega_{p-1}^{-1}\) and \(b^{(p)}_p = \Delta_p' \Sigma_{p-1}^{-1}\) we thus obtain the desired recursions for the covariance matrices \[ \begin{aligned} \Sigma_p & = \Sigma_{p-1} - \Delta_p \Omega_{p-1}^{-1} \Delta_p' = \Sigma_{p-1} - a_p^{(p)} \Delta_p' = \Sigma_{p-1} - a_p^{(p)} b_p^{(p)}\Sigma_{p-1} \\ \Omega_p & = \Omega_{p-1} - \Delta^b_p \Sigma_{p-1}^{-1} (\Delta^b_p)' = \Omega_{p-1} - b_p^{(p)} \Delta_p = \Omega_{p-1} - b_p^{(p)} a_p^{(p)}\Omega_{p-1} \end{aligned} \]

Implementation

The above described Durbin-Levinson-Wittle recursion is implemented in the function

est_ar_dlw(gamma, p.max = (dim(gamma)[3]-1), penalty = -1)

This procedure in addition computes the partial autocorrelation function, i.e.  \[ \begin{aligned} \delta_p &= (\mbox{diag}(\mathbb{E} u_{t|p-1}u_{t|p-1}'))^{-1/2} ( \mathbb{E} u_{t|p-1} v_{t-p|p-1}' ) (\mbox{diag}(\mathbb{E} v_{t-p|p-1}v_{t-p|p-1}'))^{-1/2} \\ &= (\mbox{diag}(\Sigma_{p-1}))^{-1/2} ( \Delta_p ) (\mbox{diag}(\Omega_{p-1}))^{-1/2} & \mbox{ for } p \geq 0 \end{aligned} \]

OLS Estimate

Let

\[ Y = \begin{pmatrix} y_1' & \cdots & y_{p}' & y_{p+1}' \\ y_2' & \cdots & y_{p+1}' & y_{p+2}' \\ \vdots & & \vdots & \vdots \\ y_{N-p}' & \cdots & y_{N-1}' & y_{N}' \\ \end{pmatrix} = (Q_1 \; Q_2) \begin{pmatrix} R_{11} & R_{12} \\ 0 & R_{22} \end{pmatrix} \]

note on numerical issues -> Rd

wrapper function

store mean, intercept!

ARMA Processes

Computation of the ACF

\[ a_0 y_t + a_1 y_{t-1} \cdots + a_p y_{t-p} = b_0 \epsilon_t + b_1 \epsilon_{t-1} + \cdots b_q \epsilon_{t-q} \]

We assume that the stability condition holds and thus the stationary solution of the ARMA system is the causal MA(\(\infty\)) process \[ y_t = \sum_{i \geq 0} k_i \epsilon_{t-i}. \]

This implies in particular

\[ \mathbb{E} \epsilon_{t-i}y_t' = \begin{cases} \Sigma k_i' & \mbox{ for } i \geq 0 \\ 0 & \mbox{ for } i < 0. \end{cases} \]

The genaralized Yule-Walker equations now are obtained by multiplying the ARMA system from the right with \(y_{t-j}'\) and taking expectations \[ \begin{aligned} a_0 \gamma_j + a_1 \gamma_{j-1} + \cdots + a_p \gamma_{j-p} & = b_0 \mathbb{E} \epsilon_t y_{t-j}' + b_1 \mathbb{E} \epsilon_{t-1} y_{t-j}' + \cdots + b_q \mathbb{E} \epsilon_{t-q} y_{t-j}' \\ & = b_j \Sigma k_0' + \cdots + b_q \Sigma k_{q-j}' & \mbox{for } 0 \leq j \leq q \\ & = 0 & \mbox{ for } j> q \end{aligned} \] In the following the right hand sides of these equations will be denoted with \(\Delta_j\), \(j\geq 0\).

In a first step we consider the equations for \(j=0,\ldots,p\) and solve these equations for \(\gamma_0, \ldots, \gamma_p\). To this end we consider the “vectorised” covariances \(\mathrm{vec}(\gamma_k)\) and note that \[ \mathrm{vec}(a_i \gamma_k) = (I_m \otimes a_i) \mathrm{vec} (\gamma_k) \mbox{ and } \mathrm{vec}(a_i \gamma_{-k}) = \mathrm{vec}(a_i \gamma_{k}') = (I_m \otimes a_i) \mathrm{vec}(\gamma_k') = (I_m \otimes a_i) P \mathrm{vec} (\gamma_k) \] where \(P \in\mathbb{R}^{m^2 \times m^2}\) is a permutation matrix.

As a simple example consider the case \(p=2\): The Yule-Walker equations for \(j=0,\ldots,2\) \[ \begin{array}{rrrcr} a_0 \gamma_0 &+ a_1 \gamma_{-1} &+ a_2 \gamma_{-2} &=& \Delta_0 \\ a_0 \gamma_{1} &+ a_1 \gamma_{0} &+ a_2 \gamma_{-1} &=& \Delta_1 \\ a_0 \gamma_{2} &+ a_1 \gamma_{1} &+ a_2 \gamma_{0} &=& \Delta_2 \\ \end{array} \] give the following “vectorized” equation system:

\[ \begin{array}{rrrcr} (I_m \otimes a_0) \mathrm{vec}(\gamma_0) &+ (I_m \otimes a_1)P \mathrm{vec}(\gamma_{1}) &+ (I_m \otimes a_2)P \mathrm{vec}(\gamma_{2}) &=& \mathrm{vec}(\Delta_0) \\ (I_m \otimes a_1) \mathrm{vec}(\gamma_{0}) &+ ((I_m \otimes a_0)+(I_m \otimes a_2)P) \mathrm{vec}(\gamma_{1}) &+ 0 \mathrm{vec}(\gamma_{1}) &=& \mathrm{vec}(\Delta_1) \\ (I_m \otimes a_2) \mathrm{vec}(\gamma_{0}) &+ (I_m \otimes a_1) \mathrm{vec}(\gamma_{1}) &+ (I_m \otimes a_0) \mathrm{vec}(\gamma_{2}) &=& \mathrm{vec}(\Delta_2) \end{array} \] In the second step the AR coefficients \(\gamma_j\), \(j>p\) are determined by the recursion \[ \gamma_j = \Delta_j - a_0^{-1}a_1 \gamma_{j-1} - \cdots - a_0^{-1}a_p \gamma_{j-p} \mbox{ for } j > p \]

This procedure is implemented in

autocov.armamod(obj, type=c('covariance','correlation','partial'), lag.max = 12)

Solve Difference equations

AR(p) Systems

\[ y_t = a_1 y_{t-1} + \cdots + a_p y_{t-p} + u_t = (a_p,\ldots,a_1)(y_{t-p},\ldots,y_{t-1}) + u_t \]

References

(Scherrer and Deistler 2019).

Scherrer, Wolfgang, and Manfred Deistler. 2019. “Chapter 6 - Vector Autoregressive Moving Average Models.” In Conceptual Econometrics Using r, edited by Hrishikesh D. Vinod and C. R. Rao, 41:145–91. Handbook of Statistics. Elsevier. https://doi.org/https://doi.org/10.1016/bs.host.2019.01.004.
Whittle, P. 1963. On the fitting of multivariate autoregressions, and the approximate canonical factorization of a spectral density matrix.” Biometrika 50 (1): 129–34.