What is the fastest way of converting records holding only numeric data into fixed with format strings and writing them to a file in Python? For example, suppose record
is a huge list consisting of objects with attributes id
, x
, y
, and wt
and we frequently need to flush them to an external file. The flushing can be done with the following snippet:
with open(serial_fname(), "w") as f:
for r in records:
f.write("%07d %11.5e %11.5e %7.5f\n" % (r.id, r.x, r.y, r.wt))
However my code is spending too much time generating external files leaving too little time for doing what it is supposed to do between the flushes.
Amendmend to the original question:
I ran into this problem while writing a server software that keeps track of a global record set by pulling the information from several "producer" systems and relays any changes to the record set to "consumer" systems in real-time or near real-time in preprocessed form. Many of the consumer systems are Matlab applications.
I have listed below some suggestions I have received so far (thanks) with some comments:
- Dump only the changes, not the whole data set: I'm actually doing this already. The resulting change sets are still huge.
- Use binary (or some other more efficient) file format: I'm pretty much constrained by what Matlab can read reasonably efficiently and in addition to that the format should be platform independent.
- Use database: I am actually trying to bypass the current database solution that is deemed both too slow and cumbersome, especially on Matlab's side.
- Dividing task to separate processes: At the moment the dumping code is running in its own thread. However because of the GIL it is still consuming the same core. I guess I could move it to completely separate process.