Gaussian log Likelihood of a State Space Model
ll_kf.Rd
These routines compute the log Likelihood for time invariant, linear state space models of the form $$a_{t+1} = A a_t + Bu_t$$ $$y_t = C a_t + Du_t$$ with \(m\)-dimensional outputs \(y_t\), \(s\)-dimensional states \(a_t\) and \(n\)-dimensional disturbances \(u_t\). The disturbances are white noise with a covariance matrix \(\mathbf{E} u_t u_t'=\Sigma\). Note that the disturbances and the outputs may have different dimensions, however, only "wide" systems with (\(m\leq n\)) are implemented.
The Gaussian log likelihood (for the case of Gaussian disturbances
\(u_t\sim N(0,\Sigma)\) and
\(a_1\sim N(a_{1|0},\Pi_{1|0})\)) here is computed by the
standard Kalman Filter or the square root Kalman filter, see kf()
.
The Kalman filter is a recursive
scheme to compute the linear, least squares predictions
for \(a_{t+1}\) and \(y_{t+1}\) given the observations
\(y_t,\ldots,y_1\) up to time \(t\). These predictions are notated with
\(a_{t+1|t}\) and \(y_{t+1|t}\), the
prediction error for the output \(y_{t+1}\) is
\(\epsilon_{t+1|t}=(y_{t+1}-y_{t+1|t})\)
and the corresponding variances of the prediction errors are
$$\Pi_{t+1|t}=\mathbf{E}(a_{t+1}-a_{t+1|t})
(a_{t+1}-a_{t+1|t})',$$
$$\Sigma_{t+1|t}=\mathbf{E}(\epsilon_{t+1|t}
\epsilon_{t+1|t}').$$
The standard form of the Kalman filter is based on the parameter matrices \(A,C\), the variance of "state disturbances" \(Q=\mathbf{E}(Bu_t (Bu_t)')=(B\Sigma B')\), the variance of the "measurement disturbances" \(R=\mathbf{E}(Du_t (Du_t)')=(D\Sigma D')\) and the covariance \(S=\mathbf{E}(Bu_t(Du_t)')=(B\Sigma D')\). Furthermore we need the initial prediction \(a_{1|0}\) and the corresponding error variance \(\Pi_{1|0}\).
For the square root form of the filter we need the "square roots" \(\Pi_{1|0}^{1/2}\) and \(\Sigma^{1/2}\), i.e. matrices such that \(\Pi_{1|0} = \Pi_{1|0}^{1/2} (\Pi_{1|0}^{1/2})'\) and \(\Sigma = \Sigma^{1/2}(\Sigma^{1/2})'\). In addition, we define \(H=(D',B')'\Sigma^{1/2}\).
The (scaled) Gaussian log Likelihood of this model then may be expressed as $$\frac{-1}{2N}\sum_{t=1}^{N}\left(m\log(2\pi) + \log\det\Sigma_{t|t-1} + (y_t - y_{t|t-1})' \Sigma_{t|t-1}^{-1} (y_t - y_{t|t-1}) \right).$$
Usage
ll_kf_cpp(A, C, Q, R, S, y_t, P1, a1, tol)
ll_kf2_cpp(A, C, H_t, y_t, P1_R, a1, tol)
ll_kf(model, y, method = c("kf", "kf2"), P1 = NULL, a1 = NULL, tol = 0)
Arguments
- A
\((s,s)\) dimensional state transition matrix \(A\).
- C
\((m,s)\) dimensional matrix \(C\).
- Q, R, S
The variance, covariance matrices of the "state disturbances" (\(Bu_t\)) and the "measurement disturbances" (\(Du_t\)) as described above. These matrices must be of dimension \((s,s)\), \((m,m)\) and \((s,m)\) respectively.
- y_t
\((m,N)\) transposed data matrix
y_t = t(y)
.- P1
\((s,s)\) dimensional covariance matrix of the error of the initial state estimate, i.e. \(\Pi_{1|0}\). If
NULL
, then the state covariance \(P = APA'+B\Sigma B'\) is used. Note that this scheme assumes that the state space model is stable, i.e. that the state transition matrix \(A\) is stable.- a1
\(s\) dimensional vector, which holds the initial estimate \(a_{1|0}\) for the state at time \(t=1\). If
a1=NULL
, then a zero vector is used.- tol
(small) tolerance value (or zero). In order to speed up the computations, the algorithm(s) switch to a constant Kalman gain when there is no significant change in state error covariance. This behavior is controlled by the parameter
tol
and may be switched off by settingtol=0
.- H_t
\((n,s+m)\) dimensional matrix. This parameter corresponds to the transpose \(H'\) of \(H=(D',B')'\Sigma^{1/2}\).
- P1_R
(right) square root of
P1
, i.e.P1 = t(P1_R) \%*\% P1_R
.- model
stspmod()
object, which represents the state space model.- y
sample, i.e. an \((N,m)\) dimensional matrix, or a "time series" object (i.e.
as.matrix(y)
should return an \((N,m)\)-dimensional numeric matrix). Missing values (NA
,NaN
andInf
) are not supported.- method
Character string. If
method="kf"
thenll_kf
callsll_kf_cpp
("standard form" of the Kalman filter) and formethod="kf2"
the "square root" form of the Kalman filter is used, i.e.ll_kf2_cpp
is called. Up to numerical errors the outputs should not depend on the chosen method.
Details
The core routines are ll_kf_cpp
and ll_kf2_cpp
which are RcppArmadillo implementations
of the standard and the square root Kalman filter. The function ll_kf
is a wrapper function,
which extracts the necessary parameters from an stspmod()
object,
computes the initial covariance matrix P1
and the initial state
estimate a1
(if not provided) and then calls ll_kf_cpp
or ll2_kf_cpp
.
Square root Kalman filter: For the square root
\(\Pi_{1|0}^{1/2}\) the procedure first tries the Cholesky decomposition.
If this fails (since \(\Pi_{1|0}^{1/2}\) is (close to) singular),
then ll_kf
tries to compute a symmetric square root via the eigenvalue decomposition
of \(\Pi_{1|0}^{1/2}\).
Notes
The procedures only accept "wide" state space systems (\(m \leq n\)), since for "tall" systems (\(m > n\)) the variance of the prediction errors (\(\Sigma_{t+1|t}\)) is singular for \(t\) larger than some threshold.
See also
kf()
for computation of the predictions \(a_{t+1|t}\) and
\(y_{t+1|t}\).
Examples
s = 4 # state dimension
m = 2 # number of outputs
n = m # number of inputs (square case m=n)
n.obs = 100 # sample size
# generate a (stable) state space model (in innovation form)
tmpl = tmpl_stsp_full(m, n, s, sigma_L = "chol")
model = r_model(tmpl, bpoles = 1, sd = 0.5)
# generate a sample
data = sim(model, n.obs = n.obs, a1 = NA)
# compute Q, R, S and P1
sigma_L = model$sigma_L
sigma = tcrossprod(sigma_L)
R = model$sys$D %*% sigma %*% t(model$sys$D)
S = model$sys$B %*% sigma %*% t(model$sys$D)
Q = model$sys$B %*% sigma %*% t(model$sys$B)
P1 = lyapunov(model$sys$A, Q)
# compute H and square root of P1
H = rbind(model$sys$D, model$sys$B) %*% sigma_L
P1_R = chol(P1)
# compute logLikelihood (via Kalman Filter)
ll = ll_kf(model, data$y)
# compute logLikelihood (via square root Kalman Filter)
ll_test = ll_kf(model, data$y, method = 'kf2')
all.equal(ll, ll_test)
#> [1] TRUE
# directly call Rcpp routines, note H_t = t(H) and y_t = t(y)
ll_test = ll_kf_cpp(model$sys$A, model$sys$C, Q, R, S,
t(data$y), P1, a1 = double(s), tol=1e-8)
all.equal(ll, ll_test)
#> [1] TRUE
ll_test = ll_kf2_cpp(model$sys$A, model$sys$C, t(H),
t(data$y), P1_R, a1 = double(s), tol=1e-8)
all.equal(ll, ll_test)
#> [1] TRUE
# call the "full" kf routines
out = kf(model, data$y)
all.equal(ll, out$ll)
#> [1] TRUE
out = kf(model, data$y, method = 'kf2')
all.equal(ll, out$ll)
#> [1] TRUE