Model Predictions
predict.Rd
Compute the forecasts based on a VARMA or state space model. The procedure implements a simplified approach.
It uses the formulas for the prediction from an infinite past and sets the unknown initial values
(prior to \(t < 1\)) simply to zero. This simple approach assumes that the model is stable and
strictly miniphase and thus the disturbances \(u_t\) are the innovations of the process.
Note also that the forecasts for known exogenous inputs
are calculated, i.e. the "conditional forecasts". For an honest prediction,
forecasts of the exogenous inputs should be used.
The forecast error covariance matrix computed assumes that the true model is used.
The error which stems from the estimation of the model is not taken into account.
The utility function evaluate_prediction
may be used to assess the quality of predictions.
Arguments
- object
rldm_varma
orrldm_ss
object which represents the model.- y
\((T,n)\) matrix with the observed outputs (\(y_t\), \(t=1,...,T\)).
- h
(integer) vector of forecast horizons.
- n.ahead
(integer) number of time steps to look ahead (out of sample). This number is also denoted with \(T_0\).
- ...
not used.
- x
\((T+T_0,r)\) matrix with the exogenous inputs (\(x_t\), \(t=1,...,T+T_0\)). This input parameter is ignored, if the model has no exogenous inputs. Note that the condition forecasts are computed and hence (for a model with exogenous inputs) we need the values of the inputs up to time \(t=T+T_0\).
- yhat
\((T,n,l)\) dimensional array of forecasts. The entries
yhat[t,,i]
should contain the prediction for \(y_t\).- criteria
is a list with "evaluation criteria". See below for more details.
- benchmark
\((T,n,l)\) dimensional array with "benchmark" forecasts. If
NULL
then the naive (\(h\)-step ahead) forecasts are used as benchmark.- samples
is a list with "(sub) samples". See below for more details.
Value
The function predict
returns a list with components
- yhat
\((T,n,l)\) dimensional array for the h-step ahead forecast. \(T\) is the sample size, \(n\) is the dimension of the outputs \(y_t\) and \(l\) is the number of forecasts made, i.e. the length of the vector
h
. The entriesyhat[t,,i]
are theh[i]
-step ahead forecast for \(y_t\).- sigmahat
\((n,n,l)\) dimensional array, where
sigmahat[,,i]
contains the theoretical covariance matrix of the \(h\)-step ahead prediction error forh=h[i]
.- h
the (integer) vector of forecasts horizons considered.
- yhat.ahead
\((T_0,n)\) dimensional matrix, which contains the "out-of-sample" forecasts for \(t=T+1, t=T+2,...,t=T+T_0\).
- sigmahat.ahead
\((n,n,T_0)\) dimensional array, where
sigmahat.ahead[,,h]
contains the theoretical covariance matrix of the \(h\)-step ahead prediction error.- y,x
the original data.
The function evaluate_prediction
returns a 4-dimensional array where the dimensions refer to
the evaluation criteria, the (sub) samples, the predictors and the components of the output \(y_t\).
Note that the evaluation criteria are applied to the (\(n\)) individual components as well as to
the joint vector and hence the 4-th dimension of the array has size \(n+1\).
E.g. if we consider the "RMSE" and the "MAE" of the forecast errors, two samples (a training sample and a
test sample), 5 forecast horizons \(h=1,\ldots,5\) and a process
\((y_t)\) with 2 components, then the result will be a \((2,2,5,3)\)-dimensional array.
Details
The utility function evaluate_prediction
may be used to asses the quality of some given predictions.
(E.g. as computed by predict
). The evaluation criteria to be used are passed as parameter criteria
to the function. This parameter is a list with components which are either character strings (for selection of
one of the implemented quality measures) or a user defined function. Such a function takes three arguments
fun(uhat, y, utilde)
where uhat
is a matrix of prediction errors,
y
is a matrix of the (corresponding) true values and ytilde
is a matrix with
the predictions of a benchmark prediction procedures. This allows to compute relative error measures.
The benchmark predictions are passed on via the parameter benchmark
to the procedure.
If this input parameter is missing then the naive \(h\)-step ahead predictions are used a benchmark.
(Therefore the user also has to specify the respective forecast horizons via the paramater h
.)
The following evaluation criteria are implemented (\(\hat{y}_{it}\) denotes the prediction error for \(y_{it}\) and \(\tilde{u}_{it}\) is the corresponding error of the benchmark procedure.)
- MSE
Mean Square Error
- RMSE
Root Mean Square Error
- MAE
Mean Absolute Error
- MdAE
Median Absolute Error
- MAPE
Mean Absolute Percentage Error \(100 mean(|\hat{u}_{it}/y_{it}|)\)
- MdAPE
Median Absolute Percentage Error \(100 median(|\hat{u}_{it}/y_{it}|)\)
- RMdSPE
Root Median Square Percentage Error \(100 \sqrt{median(\hat{u}^2_{it}/y^2_{it})}\)
- RelRMSE
Relative Root Mean Square Error \(\sqrt{mean(\hat{u}^2_{it})}/\sqrt{mean(\tilde{u}^2_{it})}\)
- RelMAE
Relative Mean Absolute Error \(mean(|\hat{u}_{it}|)/mean(|\tilde{u}^2_{it}|)\)
- PB
Percentage better \(mean(100 I(|\hat{u}_{it}|< |\tilde{u}_{it}|))\)
- HR
Hit Rate \(100 mean( I((\tilde{u}_{it} - \hat{u}_{it})\tilde{u}_{it} \geq 0))\). To be precise this measure computes the hit rate only if the naive prediction is the benchmark.
The procedure also supports the evaluation on different (sub) samples.
The parameter samples
is simply list of integer vectors, where each vector defines a sub sample.
E.g. for daily data, one could evaluate the predictions for different weekdays.
Examples
# create a "random" ARMA(1,1) model (stable and miniphase)
model = test_armamod(dim = c(2,2), degrees = c(1,1), bpoles = 1, bzeroes = 1)
# generate data (sample size n.obs = 200, "burn_in" phase has length 100.)
data = sim(model, n.obs = 200, n.burn_in = 100)
# predict with true model
pred_true = predict(model, data$y, h = c(1,5))
# estimate AR model, order selection by AIC
n.train = 110 # use the first 110 observation for estimation
n.test = 90 # the last 90 observations are used for (a fair) comparison
model_ar = est_ar(data$y[1:n.train,], mean_estimate = "zero",
ic = 'AIC', method = 'ols')$model
# predict with AR model
pred_ar = predict(model_ar, data$y, h = c(1,5))
# estimate AR1 model (Yule-Walker)
model_ar1 = est_ar(data$y[1:n.train,], mean_estimate = "zero",
penalty = -1, p.max = 1, method = 'yule-walker')$model
# predict with AR1 model
pred_ar1 = predict(model_ar1, data$y, h = c(1,5))
# evaluate prediction of the AR model (with the AR1 prediction as benchmark)
stats = evaluate_prediction(data$y, pred_ar$yhat, h = pred_ar$h,
criteria = list('RMSE', 'MAE', 'PB'),
samples = list(train = 21:n.train, test = (n.train+1):(n.train+n.test)),
benchmark = pred_ar1$yhat)
# use array2data.frame for "tabular" display of the results
print(array2data.frame(stats, rows = 1:3, cols = 4))
#> criterion sample predictor y[1] y[2] total
#> 1 RMSE train h=1 1.285645 1.984521 1.672006
#> 2 MAE train h=1 1.035758 1.597253 1.316505
#> 3 PB train h=1 66.666667 55.555556 61.111111
#> 4 RMSE test h=1 1.372176 2.111571 1.780674
#> 5 MAE test h=1 1.104249 1.701710 1.402980
#> 6 PB test h=1 68.888889 54.444444 61.666667
#> 7 RMSE train h=5 5.848072 4.827668 5.362198
#> 8 MAE train h=5 4.503141 3.964198 4.233670
#> 9 PB train h=5 62.222222 53.333333 57.777778
#> 10 RMSE test h=5 6.352758 5.158962 5.786727
#> 11 MAE test h=5 4.969506 3.920153 4.444830
#> 12 PB test h=5 62.222222 65.555556 63.888889
# evaluate all predictions
# join predictions
yhat = dbind(3, pred_true$yhat, pred_ar1$yhat, pred_ar$yhat)
# define a function to compute the "Median Relative Absolute Error"
MdRAE_ = function(u.hat, y, u.bench){
stats::median(abs(u.hat/u.bench), na.rm = TRUE)
}
stats = evaluate_prediction(data$y, yhat,
h = c(pred_true$h, pred_ar1$h, pred_ar$h),
criteria = list('RMSE', 'MAE', MdRAE = MdRAE_),
samples = list(train = 21:n.train, test = (n.train+1):(n.train+n.test)))
# split prediction method and forecast horizon
dimnames.stats = dimnames(stats)
stats = stats[,,c(1,3,5,2,4,6),]
dim(stats) = c(3,2,3,2,3)
dimnames(stats) = list(criterion = dimnames.stats[[1]], sample = dimnames.stats[[2]],
model = c('true','AR1','AR'), h = paste('h=',c(1,5),sep=''),
data = dimnames.stats[[4]])
# use array2data.frame for "tabular" display of the results
print(array2data.frame(stats, cols = 5, rows = c(3,4,1,2)))
#> model h criterion sample y[1] y[2] total
#> 1 true h=1 RMSE train 1.28925209 1.9887541 1.6759048
#> 2 AR1 h=1 RMSE train 2.25142073 2.2174778 2.2345137
#> 3 AR h=1 RMSE train 1.28564487 1.9845211 1.6720058
#> 4 true h=5 RMSE train 5.86126322 4.8501260 5.3795041
#> 5 AR1 h=5 RMSE train 6.35137481 5.0398814 5.7332524
#> 6 AR h=5 RMSE train 5.84807225 4.8276678 5.3621976
#> 7 true h=1 MAE train 1.02936588 1.5851909 1.3072784
#> 8 AR1 h=1 MAE train 1.76431572 1.7429979 1.7536568
#> 9 AR h=1 MAE train 1.03575768 1.5972528 1.3165053
#> 10 true h=5 MAE train 4.31735406 3.8954089 4.1063815
#> 11 AR1 h=5 MAE train 5.13272404 4.0808778 4.6068009
#> 12 AR h=5 MAE train 4.50314135 3.9641984 4.2336699
#> 13 true h=1 MdRAE train 0.07453425 0.1567930 0.1125926
#> 14 AR1 h=1 MdRAE train 0.12751434 0.1923473 0.1552570
#> 15 AR h=1 MdRAE train 0.09410693 0.1671579 0.1109965
#> 16 true h=5 MdRAE train 0.69200551 0.7283966 0.7130315
#> 17 AR1 h=5 MdRAE train 0.83921926 0.6655518 0.7772924
#> 18 AR h=5 MdRAE train 0.72550968 0.7785526 0.7350351
#> 19 true h=1 RMSE test 1.34515614 2.0671581 1.7439305
#> 20 AR1 h=1 RMSE test 2.33480674 2.3805324 2.3577804
#> 21 AR h=1 RMSE test 1.37217613 2.1115708 1.7806738
#> 22 true h=5 RMSE test 6.01273251 4.8873366 5.4790059
#> 23 AR1 h=5 RMSE test 7.12922328 5.6608314 6.4370349
#> 24 AR h=5 RMSE test 6.35275778 5.1589618 5.7867270
#> 25 true h=1 MAE test 1.07391601 1.6507857 1.3623508
#> 26 AR1 h=1 MAE test 1.86312503 1.8993087 1.8812169
#> 27 AR h=1 MAE test 1.10424908 1.7017102 1.4029796
#> 28 true h=5 MAE test 4.57813226 3.6836098 4.1308710
#> 29 AR1 h=5 MAE test 5.71857424 4.3693863 5.0439802
#> 30 AR h=5 MAE test 4.96950650 3.9201529 4.4448297
#> 31 true h=1 MdRAE test 0.05943660 0.1369453 0.1041158
#> 32 AR1 h=1 MdRAE test 0.11286431 0.1232150 0.1187865
#> 33 AR h=1 MdRAE test 0.06813022 0.1416760 0.1168982
#> 34 true h=5 MdRAE test 0.74686373 0.7096248 0.7391295
#> 35 AR1 h=5 MdRAE test 0.96606036 0.9354539 0.9463293
#> 36 AR h=5 MdRAE test 0.88277283 0.7027472 0.8018125