views:

70

answers:

2

Hi,

I am building an application with objects which have their data stored in mysql tables (across multiple tables). When I need to work with the object (retrieve object attributes / change the attributes) I am querying the sql database using mysqldb (select / update). However, since the application is quite computation intensive, the execution time is killing me.

Wanted to understand if there are approaches where all of the data is loaded into python, the computations / modifications are done on those objects and then subsequently a full data update is done to the mysql database? Will loading the data initially into lists of those objects in one go from the database improve the performance? Also since the db size is close to around 25 mb, will it cause any memory problems.

Thanks in advance.

+5  A: 

25Mb is tiny. Microscopic. SQL is slow. Glacial.

Do not waste time on SQL unless you have transactions (with locking and multiple users).

If you're doing "analysis", especially computationally-intensive analysis, load all the data into memory.

In the unlikely event that data doesn't fit into memory, then do this.

  1. Query data into flat files. This can be fast. It's fastest if you don't use Python, but use the database native tools to extract data into CSV or something small.

  2. Read flat files and do computations, writing flat files. This is really fast.

  3. Do bulk updates from the flat files. Again, this is fastest if you use database native toolset for insert or update.


If you didn't need SQL in the first place, consider the data as you originally received it and what you're going to do with it.

  1. Read the original file once, parse it, create your Python objects and pickle the entire list or dictionary. This means that each subsequent program can simply load the pickled file and start doing analysis. However. You can't easily update the pickled file. You have to create a new one. This is not a bad thing. It gives you complete processing history.

  2. Read the original file once, parse it, create your Python objects using shelve. This means you can update the file.

  3. Read the original file once, parse it, create your Python objects and save the entire list or dictionary as a JSON or YAML file. This means that each subsequent program can simply load the JSON (or YAML) file and start doing analysis. However. You can't easily update the file. You have to create a new one. This is not a bad thing. It gives you complete processing history.

    This will probably be slightly slower than pickling. And it will require that you write some helpers so that the JSON objects are dumped and loaded properly. However, you can read JSON (and YAML) giving you some advantages in working with the file.

S.Lott
Hi S.Lott, thanks. I do not have transactions so sql is not mandatory. I do need to have persistent storage for the data though, because there are objects of multiple classes and their relations and these are being added to and used on an ongoing basis. In sql I had structured the data as multiple tables holding the object related data as well as inter-object relations. If am loading the data into memory, a list of objects - with their relations contained within them would be the way, right? Also, any suggestions on which database native tools to use. Thanks for the help.
Vivek
shelve. http://docs.python.org/library/shelve.html pickle. http://docs.python.org/library/pickle.html Indeed, anything in this chapter. http://docs.python.org/library/persistence.html
S.Lott
A: 

Please check sqlalchemy, an Object Relational Mapper for Python.

sqlalchemy allows you to map database tables to Python objects. When you do this, all the operations can be done on the Python objects (once the data is loaded), and when you are done processing, you can update the database.

Assuming you have a baseline of-the-shelf computer, 25 MB is absolutely no big deal, you can cache the entire database into memory.

Arrieta