Good morning,
My question saga continues about R.
I have been working on large datasets lately (more than 400 thousands lines).
So far, I have been using XTS format, which worked fine for "small" datasets of a few tenth of thousands elements.
Now that the project grows, R simply crashes when retrieving the data for the database and putting it into the XTS.
It is my understanding that R should be able to have vectors with size up to 2^32-1 elements (or 2^64-1 according the the version). Hence, I came to the conclusion that XTS might have some limitations but I could not find the answer in the doc. (maybe I was a bit overconfident about my understanding of theoretical possible vector size).
To sum up, I would like to know if:
1) XTS has indeed a size limitation
2) What do you think is the smartest way to handle large time series?
(I was thinking about splitting the analysis into several smaller datasets).
3) I don't get an error message, R simply shuts down automatically. Is this a known behavior?
Thanks for you help,
Jeremie
SOLUTION
- The same as R and it depends on the kind of memory being used (64bits, 32 bits). It is anyway extremely large.
- Chuncking data is indeed a good idea, but it is not needed.
- This problem came from a bug in R 2.11.0 which has been solved in R 2.11.1. There was a problem with long dates vector (here the indexes of the XTS).