views:

824

answers:

5

we have to fit about 2000 or odd time series every month, they have very idiosyncratic behavior in particular, some are arma/arima, some are ewma, some are arch/garch with or without seasonality and/or trend (only thing in common is the time series aspect).

one can in theory build ensemble model with aic or bic criterion to choose the best fit model but is the community aware of any library which attempts to solve this problem?

Google made me aware of the below one by Rob J Hyndman link

but are they any other alternatives?

+1  A: 

I'll be interested to see what other people suggest. I'm sure there are functions for this in zelig or caret.

The forecast package (Hyndman and Khandakar 2008) that you mention is very useful. See this paper for a discussion/example (these how applications: here and here). For instance, it contains a function auto.arima() that performs a search over a set of models. Kleiber and Zeilis "Applied Econometrics with R" have some nice examples of how to do this, so I recommend that book. Here's an example of the usage:

> fit <- auto.arima(WWWusage)
> fit
    Series: WWWusage 
    ARIMA(1,1,1)                    

    Call: auto.arima(x = WWWusage) 

    Coefficients:
         ar1     ma1
      0.6504  0.5256
    s.e.  0.0842  0.0896

    sigma^2 estimated as 9.793:  log likelihood = -254.15
    AIC = 514.3   AICc = 514.55   BIC = 522.08

If you want to try something different, the accuracy package has some useful functionality which can be used. For instance, look at the modelsCompare function. This should work with time series models as well.

I think that AIC or BIC are generally the most widely accepted statistics. Venables and Ripley "Modern Applied Statistics with S" (see the MASS package) has a lot of material on "automated model selection". I also recommend that book.

Lastly, the tsDyn package also contains functions to help with selection for non-linear models (see the excellent vignette on the subject). You can use the selectSETAR, selectLSTAR, and selectNNET functions.

Shane
Incidentally, you might find the Qian/Zhao 2006 paper <a href=http://www.sciencedirect.com/science/article/B6V8V-4MR86DN-1/2/f00006b05316b7bd2ce67aad75e39301>"On time series model selection involving many candidate ARMA models"</a> to be of interest. They reference R functions but don't provide all the code.
Shane
Stepwise regression. *Shudder*
hadley
The two papers of mine that you link to are not directly relevant to this topic. A better source is http://www.jstatsoft.org/v27/i03 which discusses the forecast package and the automatic forecasting methods that it includes.
Rob Hyndman
+2  A: 

There are two automatic methods in the forecast package: auto.arima() which will handle automatic modelling using ARIMA models, and ets() which will automatically select the best model from the exponential smoothing family (including trend and seasonality where appropriate). The AIC is used in both cases for model selection. Neither handles ARCH/GARCH models though. The package is described in some detail in this JSS article: http://www.jstatsoft.org/v27/i03

Rob Hyndman
Rob: It looks like auto.arima() only works with ts objects. Any thoughts about allowing it to accept other irregular time series (e.g. with zoo)? As a simple example with the quantmod package: { getSymbols("GS"); auto.arima(as.zoo(GS[,'GS.Close'])) }
Shane
No. ARIMA models for irregularly spaced data are very tricky. Essentially you need to fit a continuous time ARMA (see papers by Brockwell et al) which is a very different sort of model than the discrete time counterpart.
Rob Hyndman
A: 

Thanks useRs, I have tried the forecast package, that too as a composite of arima and ets, but not to much acclaim from aic or bic(sbc), so i am now tempted to treat each of the time series to its own svm(support vector machine) because of its better genralization adaptability and also being able to add other variables apart from lags and non linear kernel functions

Any premonitions?

Arun
You can't compare the AIC from ARIMA and ETS as they are based on different datasets due to differencing. Also, I have seen no evidence that svm is a good general time series forecasting algorithm. For example, the forecasting M-competitions have shown that non-linear data mining methods tend to perform worse than the linear statistical models on large sets of univariate time series data. There is quite a lot of literature on this. I suggest you read the papers associated with the M3 competition before you try to come up with your own untested method.
Rob Hyndman
+1  A: 

When will it be possible to use forecast package functions, especially ets function, with high dimensional data(weekly data, for example)? Thx

acroa
Probably early next year. The paper is written (see http://robjhyndman.com/working-papers/complex-seasonality) and we are working on the code now.
Rob Hyndman
A: 

Can anyone, please, tell me what are the numbers of parameters to be penalized for when using information criterions for selecting the best models? Let's say that we have 3 models: 1. Simple exponential smoothing 2. Holt's method(level+trend) 3. Holt Winters(L+T+S), where we have monthly seasonality Thx

acroa
This question would probably be better asked on the R mailing list. At a minimum, I would suggest asking this as *a new question* rather than as an answer to an old one.
Shane