I have a cpu intensive code which uses a heavy dictionary as data (around 250M data). I have a multicore processor and want to utilize it so that i can run more than one task at a time. The dictionary is mostly read only and may be updated once a day.
How can i write this in python without duplicating the dictionary?
I understand that python threads don't use native threads and will not offer true concurrency. Can i use multiprocessing module without data being serialized between processes?
I come from java world and my requirement would be something like java threads which can share data, run on multiple processors and offers synchronization primitives.
views:
85answers:
3Use shelve
for the dictionary. Since writes are infrequent there shouldn't be an issue with sharing it.
You can share read-only data among processes simply with a fork
(on Unix; no easy way on Windows), but that won't catch the "once a day change" (you'd need to put an explicit way in place for each process to update its own copy). Native Python structures like dict
are just not designed to live at arbitrary addresses in shared memory (you'd have to code a dict
variant supporting that in C) so they offer no solace.
You could use Jython (or IronPython) to get a Python implementation with exactly the same multi-threading abilities as Java (or, respectively, C#), including multiple-processor usage by multiple simultaneous threads.
Take a look at this in the stdlib: http://docs.python.org/library/multiprocessing.html There are a bunch of wonderful features that will allow you to share data structures between processes very easily.