Gaussian log Likelihood of a State Space Model

These routines compute the log Likelihood for time invariant, linear state space models of the form $$a_{t+1} = A a_t + Bu_t$$ $$y_t = C a_t + Du_t$$ with $m$-dimensional outputs $y_t$, $s$-dimensional states $a_t$ and $n$-dimensional disturbances $u_t$. The disturbances are white noise with a covariance matrix $\mathbf{E} u_t u_t'=\Sigma$. Note that the disturbances and the outputs may have different dimensions, however, only "wide" systems with ($m\leq n$) are implemented.

The Gaussian log likelihood (for the case of Gaussian disturbances $u_t\sim N(0,\Sigma)$ and $a_1\sim N(a_{1|0},\Pi_{1|0})$) here is computed by the standard Kalman Filter or the square root Kalman filter, see kf(). The Kalman filter is a recursive scheme to compute the linear, least squares predictions for $a_{t+1}$ and $y_{t+1}$ given the observations $y_t,\ldots,y_1$ up to time $t$. These predictions are notated with $a_{t+1|t}$ and $y_{t+1|t}$, the prediction error for the output $y_{t+1}$ is $\epsilon_{t+1|t}=(y_{t+1}-y_{t+1|t})$ and the corresponding variances of the prediction errors are $$\Pi_{t+1|t}=\mathbf{E}(a_{t+1}-a_{t+1|t}) (a_{t+1}-a_{t+1|t})',$$ $$\Sigma_{t+1|t}=\mathbf{E}(\epsilon_{t+1|t} \epsilon_{t+1|t}').$$

The standard form of the Kalman filter is based on the parameter matrices $A,C$, the variance of "state disturbances" $Q=\mathbf{E}(Bu_t (Bu_t)')=(B\Sigma B')$, the variance of the "measurement disturbances" $R=\mathbf{E}(Du_t (Du_t)')=(D\Sigma D')$ and the covariance $S=\mathbf{E}(Bu_t(Du_t)')=(B\Sigma D')$. Furthermore we need the initial prediction $a_{1|0}$ and the corresponding error variance $\Pi_{1|0}$.

For the square root form of the filter we need the "square roots" $\Pi_{1|0}^{1/2}$ and $\Sigma^{1/2}$, i.e. matrices such that $\Pi_{1|0} = \Pi_{1|0}^{1/2} (\Pi_{1|0}^{1/2})'$ and $\Sigma = \Sigma^{1/2}(\Sigma^{1/2})'$. In addition, we define $H=(D',B')'\Sigma^{1/2}$.

The (scaled) Gaussian log Likelihood of this model then may be expressed as $$\frac{-1}{2N}\sum_{t=1}^{N}\left(m\log(2\pi) + \log\det\Sigma_{t|t-1} + (y_t - y_{t|t-1})' \Sigma_{t|t-1}^{-1} (y_t - y_{t|t-1}) \right).$$

Usage

ll_kf_cpp(A, C, Q, R, S, y_t, P1, a1, tol)

ll_kf2_cpp(A, C, H_t, y_t, P1_R, a1, tol)

ll_kf(model, y, method = c("kf", "kf2"), P1 = NULL, a1 = NULL, tol = 0)

Arguments

A: $(s,s)$ dimensional state transition matrix $A$.
C: $(m,s)$ dimensional matrix $C$.
Q, R, S: The variance, covariance matrices of the "state disturbances" ($Bu_t$) and the "measurement disturbances" ($Du_t$) as described above. These matrices must be of dimension $(s,s)$, $(m,m)$ and $(s,m)$ respectively.
y_t: $(m,N)$ transposed data matrix y_t = t(y).
P1: $(s,s)$ dimensional covariance matrix of the error of the initial state estimate, i.e. $\Pi_{1|0}$. If NULL, then the state covariance $P = APA'+B\Sigma B'$ is used. Note that this scheme assumes that the state space model is stable, i.e. that the state transition matrix $A$ is stable.
a1: $s$ dimensional vector, which holds the initial estimate $a_{1|0}$ for the state at time $t=1$. If a1=NULL, then a zero vector is used.
tol: (small) tolerance value (or zero). In order to speed up the computations, the algorithm(s) switch to a constant Kalman gain when there is no significant change in state error covariance. This behavior is controlled by the parameter tol and may be switched off by setting tol=0.
H_t: $(n,s+m)$ dimensional matrix. This parameter corresponds to the transpose $H'$ of $H=(D',B')'\Sigma^{1/2}$.
P1_R: (right) square root of P1, i.e. P1 = t(P1_R) \%*\% P1_R.
model: stspmod() object, which represents the state space model.
y: sample, i.e. an $(N,m)$ dimensional matrix, or a "time series" object (i.e. as.matrix(y) should return an $(N,m)$-dimensional numeric matrix). Missing values (NA, NaN and Inf) are not supported.
method: Character string. If method="kf" then ll_kf calls ll_kf_cpp ("standard form" of the Kalman filter) and for method="kf2" the "square root" form of the Kalman filter is used, i.e. ll_kf2_cpp is called. Up to numerical errors the outputs should not depend on the chosen method.

Value

(double) The Gaussian log Likelihood of the model.

Details

The core routines are ll_kf_cpp and ll_kf2_cpp which are RcppArmadillo implementations of the standard and the square root Kalman filter. The function ll_kf is a wrapper function, which extracts the necessary parameters from an stspmod() object, computes the initial covariance matrix P1 and the initial state estimate a1 (if not provided) and then calls ll_kf_cpp or ll2_kf_cpp.

Square root Kalman filter: For the square root $\Pi_{1|0}^{1/2}$ the procedure first tries the Cholesky decomposition. If this fails (since $\Pi_{1|0}^{1/2}$ is (close to) singular), then ll_kf tries to compute a symmetric square root via the eigenvalue decomposition of $\Pi_{1|0}^{1/2}$.

Notes

The procedures only accept "wide" state space systems ($m \leq n$), since for "tall" systems ($m > n$) the variance of the prediction errors ($\Sigma_{t+1|t}$) is singular for $t$ larger than some threshold.

Examples

s = 4  # state dimension
m = 2  # number of outputs
n = m  # number of inputs (square case m=n)
n.obs = 100 # sample size

# generate a (stable) state space model (in innovation form)
tmpl = tmpl_stsp_full(m, n, s, sigma_L = "chol")
model = r_model(tmpl, bpoles = 1, sd = 0.5)
# generate a sample
data = sim(model, n.obs = n.obs, a1 = NA)

# compute Q, R, S and P1
sigma_L = model$sigma_L
sigma = tcrossprod(sigma_L)
R = model$sys$D %*% sigma %*% t(model$sys$D)
S = model$sys$B %*% sigma %*% t(model$sys$D)
Q = model$sys$B %*% sigma %*% t(model$sys$B)
P1 = lyapunov(model$sys$A, Q)

# compute H and square root of P1
H = rbind(model$sys$D, model$sys$B) %*% sigma_L
P1_R = chol(P1)

# compute logLikelihood (via Kalman Filter)
ll = ll_kf(model, data$y)

# compute logLikelihood (via square root Kalman Filter)
ll_test = ll_kf(model, data$y, method = 'kf2')
all.equal(ll, ll_test)
#> [1] TRUE

# directly call Rcpp routines, note H_t = t(H) and y_t = t(y)
ll_test = ll_kf_cpp(model$sys$A, model$sys$C, Q, R, S,
                    t(data$y), P1, a1 = double(s), tol=1e-8)
all.equal(ll, ll_test)
#> [1] TRUE
ll_test = ll_kf2_cpp(model$sys$A, model$sys$C, t(H),
                     t(data$y), P1_R, a1 = double(s), tol=1e-8)
all.equal(ll, ll_test)
#> [1] TRUE

# call the "full" kf routines
out = kf(model, data$y)
all.equal(ll, out$ll)
#> [1] TRUE
out = kf(model, data$y, method = 'kf2')
all.equal(ll, out$ll)
#> [1] TRUE