Hello,
I need to load (de-serialize) a pre-computed list of integers from a file in a Python script (into a Python list). The list is large (upto millions of items), and I can choose the format I store it in, as long as loading is fastest.
Which is the fastest method, and why?
- Using
import
on a .py file that just contains the list assigned to a variable - Using
cPickle
'sload
- Some other method (perhaps
numpy
?)
Also, how can one benchmark such things reliably?
Addendum: measuring this reliably is difficult, because import
is cached so it can't be executed multiple times in a test. The loading with pickle also gets faster after the first time probably because page-precaching by the OS. Loading 1 million numbers with cPickle
takes 1.1 sec the first time run, and 0.2 sec on subsequent executions of the script.
Intuitively I feel cPickle
should be faster, but I'd appreciate numbers (this is quite a challenge to measure, I think).
And yes, it's important for me that this performs quickly.
Thanks