data <-c(88, 84, 85, 85, 84, 85, 83, 85, 88, 89, 91, 99, 104, 112, 126, 138, 146,151, 150, 148, 147, 149, 143, 132, 131, 139, 147, 150, 148, 145, 140, 134, 131, 131, 129, 126, 126, 132, 137, 140, 142, 150, 159, 167, 170, 171, 172, 172, 174, 175, 172, 172, 174, 174, 169, 165, 156, 142, 131, 121, 112, 104, 102, 99, 99, 95, 88, 84, 84, 87, 89, 88, 85, 86, 89, 91, 91, 94, 101, 110, 121, 135, 145, 149, 156, 165, 171, 175, 177, 182, 193, 204, 208, 210, 215, 222, 228, 226, 222, 220)
Why do the ARMA models acting on the first differences of the data differ from the corresponding ARIMA models?
for (p in 0:5)
{
for (q in 0:5)
{
#data.arma = arima(diff(data), order = c(p, 0, q));cat("p =", p, ", q =", q, "AIC =", data.arma$aic, "\n");
data.arma = arima(data, order = c(p, 1, q));cat("p =", p, ", q =", q, "AIC =", data.arma$aic, "\n");
}
}
Same with Arima(data,c(5,1,4)) and Arima(diff(data),c(5,0,4)) in the forecast package. I can get the desired consistency with
auto.arima(diff(data),max.p=5,max.q=5,d=0,approximation=FALSE, stepwise=FALSE, ic ="aic", trace=TRUE);
auto.arima(data,max.p=5,max.q=5,d=1,approximation=FALSE, stepwise=FALSE, ic ="aic", trace=TRUE);
but it seems the holder of the minimum AIC estimate for these data has not been considered by the algorithm behind auto.arima; hence the suboptimal choice of ARMA(3,0) instead of ARMA(5,4) acting on the first differences. A related question is how much the two AIC estimates should differ before one considers one model better than the other has little to do wuth programming - the smallest AIC holder should at least be considered/reported, even though 9 coefficients may be a bit too much for a forecast from 100 observations.
My R questions are:
1) Vectorised version of the double loop so it is faster?
2) Why does arima(5,1,4) acting on the data differ from arma(5,4) acting on the first differences of the data? Which one is to be reported?
3) How do I sort the AICs output so that the smaller come first?
Thanks.