Model Predictions — predict • RLDM

Compute the forecasts based on a VARMA or state space model. The procedure implements a simplified approach. It uses the formulas for the prediction from an infinite past and sets the unknown initial values (prior to \(t < 1\)) simply to zero. This simple approach assumes that the model is stable and strictly miniphase and thus the disturbances \(u_t\) are the innovations of the process. Note also that the forecasts for known exogenous inputs are calculated, i.e. the "conditional forecasts". For an honest prediction, forecasts of the exogenous inputs should be used.
The forecast error covariance matrix computed assumes that the true model is used. The error which stems from the estimation of the model is not taken into account.

The utility function evaluate_prediction may be used to assess the quality of predictions.

Usage

# S3 method for armamod
predict(object, y, h = 1, n.ahead = 0, ...)

# S3 method for stspmod
predict(object, y, x, h = 1, n.ahead = 0, ...)

evaluate_prediction(
  y,
  yhat,
  h,
  criteria = list("RMSE"),
  benchmark = NULL,
  samples = list(1:nrow(y))
)

Arguments

object: rldm_varma or rldm_ss object which represents the model.
y: \((T,n)\) matrix with the observed outputs (\(y_t\), \(t=1,...,T\)).
h: (integer) vector of forecast horizons.
n.ahead: (integer) number of time steps to look ahead (out of sample). This number is also denoted with \(T_0\).
...: not used.
x: \((T+T_0,r)\) matrix with the exogenous inputs (\(x_t\), \(t=1,...,T+T_0\)). This input parameter is ignored, if the model has no exogenous inputs. Note that the condition forecasts are computed and hence (for a model with exogenous inputs) we need the values of the inputs up to time \(t=T+T_0\).
yhat: \((T,n,l)\) dimensional array of forecasts. The entries yhat[t,,i] should contain the prediction for \(y_t\).
criteria: is a list with "evaluation criteria". See below for more details.
benchmark: \((T,n,l)\) dimensional array with "benchmark" forecasts. If NULL then the naive (\(h\)-step ahead) forecasts are used as benchmark.
samples: is a list with "(sub) samples". See below for more details.

Value

The function predict returns a list with components

yhat: \((T,n,l)\) dimensional array for the h-step ahead forecast. \(T\) is the sample size, \(n\) is the dimension of the outputs \(y_t\) and \(l\) is the number of forecasts made, i.e. the length of the vector h. The entries yhat[t,,i] are the h[i]-step ahead forecast for \(y_t\).
sigmahat: \((n,n,l)\) dimensional array, where sigmahat[,,i] contains the theoretical covariance matrix of the \(h\)-step ahead prediction error for h=h[i].
h: the (integer) vector of forecasts horizons considered.
yhat.ahead: \((T_0,n)\) dimensional matrix, which contains the "out-of-sample" forecasts for \(t=T+1, t=T+2,...,t=T+T_0\).
sigmahat.ahead: \((n,n,T_0)\) dimensional array, where sigmahat.ahead[,,h] contains the theoretical covariance matrix of the \(h\)-step ahead prediction error.
y,x: the original data.

The function evaluate_prediction returns a 4-dimensional array where the dimensions refer to the evaluation criteria, the (sub) samples, the predictors and the components of the output \(y_t\). Note that the evaluation criteria are applied to the (\(n\)) individual components as well as to the joint vector and hence the 4-th dimension of the array has size \(n+1\). E.g. if we consider the "RMSE" and the "MAE" of the forecast errors, two samples (a training sample and a test sample), 5 forecast horizons \(h=1,\ldots,5\) and a process \((y_t)\) with 2 components, then the result will be a \((2,2,5,3)\)-dimensional array.

Details

The utility function evaluate_prediction may be used to asses the quality of some given predictions. (E.g. as computed by predict). The evaluation criteria to be used are passed as parameter criteria to the function. This parameter is a list with components which are either character strings (for selection of one of the implemented quality measures) or a user defined function. Such a function takes three arguments fun(uhat, y, utilde) where uhat is a matrix of prediction errors, y is a matrix of the (corresponding) true values and ytilde is a matrix with the predictions of a benchmark prediction procedures. This allows to compute relative error measures.

The benchmark predictions are passed on via the parameter benchmark to the procedure. If this input parameter is missing then the naive \(h\)-step ahead predictions are used a benchmark. (Therefore the user also has to specify the respective forecast horizons via the paramater h.)

The following evaluation criteria are implemented (\(\hat{y}_{it}\) denotes the prediction error for \(y_{it}\) and \(\tilde{u}_{it}\) is the corresponding error of the benchmark procedure.)

MSE: Mean Square Error
RMSE: Root Mean Square Error
MAE: Mean Absolute Error
MdAE: Median Absolute Error
MAPE: Mean Absolute Percentage Error \(100 mean(|\hat{u}_{it}/y_{it}|)\)
MdAPE: Median Absolute Percentage Error \(100 median(|\hat{u}_{it}/y_{it}|)\)
RMdSPE: Root Median Square Percentage Error \(100 \sqrt{median(\hat{u}^2_{it}/y^2_{it})}\)
RelRMSE: Relative Root Mean Square Error \(\sqrt{mean(\hat{u}^2_{it})}/\sqrt{mean(\tilde{u}^2_{it})}\)
RelMAE: Relative Mean Absolute Error \(mean(|\hat{u}_{it}|)/mean(|\tilde{u}^2_{it}|)\)
PB: Percentage better \(mean(100 I(|\hat{u}_{it}|< |\tilde{u}_{it}|))\)
HR: Hit Rate \(100 mean( I((\tilde{u}_{it} - \hat{u}_{it})\tilde{u}_{it} \geq 0))\). To be precise this measure computes the hit rate only if the naive prediction is the benchmark.

The procedure also supports the evaluation on different (sub) samples. The parameter samples is simply list of integer vectors, where each vector defines a sub sample. E.g. for daily data, one could evaluate the predictions for different weekdays.

Examples


# create a "random" ARMA(1,1) model (stable and miniphase)
model = test_armamod(dim = c(2,2), degrees = c(1,1), bpoles = 1, bzeroes = 1)

# generate data (sample size n.obs = 200, "burn_in" phase has length 100.)
data = sim(model, n.obs = 200, n.burn_in = 100)

# predict with true model
pred_true = predict(model, data$y, h = c(1,5))

# estimate AR model, order selection by AIC
n.train = 110   # use the first 110 observation for estimation
n.test = 90     # the last 90 observations are used for (a fair) comparison
model_ar = est_ar(data$y[1:n.train,],  mean_estimate = "zero",
                  ic = 'AIC', method = 'ols')$model

# predict with AR model
pred_ar = predict(model_ar, data$y, h = c(1,5))

# estimate AR1 model (Yule-Walker)
model_ar1 = est_ar(data$y[1:n.train,],  mean_estimate = "zero",
                  penalty = -1, p.max = 1, method = 'yule-walker')$model

# predict with AR1 model
pred_ar1 = predict(model_ar1, data$y, h = c(1,5))

# evaluate prediction of the AR model (with the AR1 prediction as benchmark)
stats = evaluate_prediction(data$y, pred_ar$yhat, h = pred_ar$h,
                            criteria = list('RMSE', 'MAE', 'PB'),
                            samples = list(train = 21:n.train, test = (n.train+1):(n.train+n.test)),
                            benchmark = pred_ar1$yhat)

# use array2data.frame for "tabular" display of the results
print(array2data.frame(stats, rows = 1:3, cols = 4))
#>    criterion sample predictor      y[1]      y[2]     total
#> 1       RMSE  train       h=1  1.285645  1.984521  1.672006
#> 2        MAE  train       h=1  1.035758  1.597253  1.316505
#> 3         PB  train       h=1 66.666667 55.555556 61.111111
#> 4       RMSE   test       h=1  1.372176  2.111571  1.780674
#> 5        MAE   test       h=1  1.104249  1.701710  1.402980
#> 6         PB   test       h=1 68.888889 54.444444 61.666667
#> 7       RMSE  train       h=5  5.848072  4.827668  5.362198
#> 8        MAE  train       h=5  4.503141  3.964198  4.233670
#> 9         PB  train       h=5 62.222222 53.333333 57.777778
#> 10      RMSE   test       h=5  6.352758  5.158962  5.786727
#> 11       MAE   test       h=5  4.969506  3.920153  4.444830
#> 12        PB   test       h=5 62.222222 65.555556 63.888889

# evaluate all predictions
# join predictions
yhat  = dbind(3, pred_true$yhat, pred_ar1$yhat, pred_ar$yhat)

# define a function to compute the "Median Relative Absolute Error"
MdRAE_ = function(u.hat, y, u.bench){
   stats::median(abs(u.hat/u.bench), na.rm = TRUE)
}
stats = evaluate_prediction(data$y, yhat,
                            h = c(pred_true$h, pred_ar1$h, pred_ar$h),
                            criteria = list('RMSE', 'MAE', MdRAE = MdRAE_),
                            samples = list(train = 21:n.train, test = (n.train+1):(n.train+n.test)))

# split prediction method and forecast horizon
dimnames.stats = dimnames(stats)
stats = stats[,,c(1,3,5,2,4,6),]
dim(stats) = c(3,2,3,2,3)
dimnames(stats) = list(criterion = dimnames.stats[[1]], sample = dimnames.stats[[2]],
                      model = c('true','AR1','AR'), h = paste('h=',c(1,5),sep=''),
                      data = dimnames.stats[[4]])

# use array2data.frame for "tabular" display of the results
print(array2data.frame(stats, cols = 5, rows = c(3,4,1,2)))
#>    model   h criterion sample       y[1]      y[2]     total
#> 1   true h=1      RMSE  train 1.28925209 1.9887541 1.6759048
#> 2    AR1 h=1      RMSE  train 2.25142073 2.2174778 2.2345137
#> 3     AR h=1      RMSE  train 1.28564487 1.9845211 1.6720058
#> 4   true h=5      RMSE  train 5.86126322 4.8501260 5.3795041
#> 5    AR1 h=5      RMSE  train 6.35137481 5.0398814 5.7332524
#> 6     AR h=5      RMSE  train 5.84807225 4.8276678 5.3621976
#> 7   true h=1       MAE  train 1.02936588 1.5851909 1.3072784
#> 8    AR1 h=1       MAE  train 1.76431572 1.7429979 1.7536568
#> 9     AR h=1       MAE  train 1.03575768 1.5972528 1.3165053
#> 10  true h=5       MAE  train 4.31735406 3.8954089 4.1063815
#> 11   AR1 h=5       MAE  train 5.13272404 4.0808778 4.6068009
#> 12    AR h=5       MAE  train 4.50314135 3.9641984 4.2336699
#> 13  true h=1     MdRAE  train 0.07453425 0.1567930 0.1125926
#> 14   AR1 h=1     MdRAE  train 0.12751434 0.1923473 0.1552570
#> 15    AR h=1     MdRAE  train 0.09410693 0.1671579 0.1109965
#> 16  true h=5     MdRAE  train 0.69200551 0.7283966 0.7130315
#> 17   AR1 h=5     MdRAE  train 0.83921926 0.6655518 0.7772924
#> 18    AR h=5     MdRAE  train 0.72550968 0.7785526 0.7350351
#> 19  true h=1      RMSE   test 1.34515614 2.0671581 1.7439305
#> 20   AR1 h=1      RMSE   test 2.33480674 2.3805324 2.3577804
#> 21    AR h=1      RMSE   test 1.37217613 2.1115708 1.7806738
#> 22  true h=5      RMSE   test 6.01273251 4.8873366 5.4790059
#> 23   AR1 h=5      RMSE   test 7.12922328 5.6608314 6.4370349
#> 24    AR h=5      RMSE   test 6.35275778 5.1589618 5.7867270
#> 25  true h=1       MAE   test 1.07391601 1.6507857 1.3623508
#> 26   AR1 h=1       MAE   test 1.86312503 1.8993087 1.8812169
#> 27    AR h=1       MAE   test 1.10424908 1.7017102 1.4029796
#> 28  true h=5       MAE   test 4.57813226 3.6836098 4.1308710
#> 29   AR1 h=5       MAE   test 5.71857424 4.3693863 5.0439802
#> 30    AR h=5       MAE   test 4.96950650 3.9201529 4.4448297
#> 31  true h=1     MdRAE   test 0.05943660 0.1369453 0.1041158
#> 32   AR1 h=1     MdRAE   test 0.11286431 0.1232150 0.1187865
#> 33    AR h=1     MdRAE   test 0.06813022 0.1416760 0.1168982
#> 34  true h=5     MdRAE   test 0.74686373 0.7096248 0.7391295
#> 35   AR1 h=5     MdRAE   test 0.96606036 0.9354539 0.9463293
#> 36    AR h=5     MdRAE   test 0.88277283 0.7027472 0.8018125