views:

340

answers:

2

So I'm writing yet another Twisted based daemon. It'll have an xmlrpc interface as usual so I can easily communicate with it and have other processes interchange data with it as needed.

This daemon needs to access a database. We've been using SQL Alchemy in place of hard coding SQL strings for our latest projects - those mostly done for web apps in Pylons.

We'd like to do the same for this app and re-use library code that makes use of SQL Alchemy. So what to do? Well of course since that library was written for use in a Pylons app it's all the straight-forward blocking style code that everyone is accustomed to and all of the non-blocking is magically handled by Pylons via threading, thread locals, scoped sessions and so on.

So now for Twisted I guess I'm a bit stuck. I could:

  1. Just write the sql I need directly if it's minimal and use the dbapi pool in twisted to do runInteractions etc when I need to hit the db.
  2. Use the objects and inherently blocking methods in our library and block now and then in my Twisted daemon. Bah.
  3. Use sAsync which was last updated in 2008 and kind of reuse the models we have defined already but not really and this doesn't address that the library code needs to work in Pylons too. Does that even work with the latest version SQL Alchemy? Who knows. That project looked great though - why was it apparently abandoned?
  4. Spawn a separate subprocess and have it deal with the library code and all it's blocking, the results being returned back to my daemon when ready as objects marshalled via YAML over xmlrpc.
  5. Use deferToThread and then expunge the objects returned having made sure to do eager loads so that I have all my stuff that I might need. Seems kind of ugha to me.

I'm also stuck using Python 2.5.4 atm so no 2.6 yet and I don't think I can just do an import from future to get access to the cool new multiprocessing module stuff in there. That's OK though I guess as we've got dealing with interprocess communication down pretty well.

So I'm leaning towards option 4 mostly as that would avoid the mortal sin of logic duplication with option 1 while also staying the heck away from threads.

My first attempt though will be option 2 to just get the thing going and then separate out the calls to the library code perhaps into a separate process if it looks like there's a good chance that something might take a bit too long to block on. Sad. Maybe a combination of Stackless Python and Twisted would be interesting here.

Any better ideas?

+1  A: 

Firstly, I can unfortunately only second your opinion that twisted and SQLAlchemy don't play along very well. I have worked some with both and would be somewhat afraid of the complexity that would arise from putting them together.

All the database integration layers that I know of to date use twisteds threading integration layer, and if you want to avoid that at all costs you are pretty much stuck with point 4 in your list.

On the other hand, I have seen examples of database connecting code using deferToThread() and friends that worked very well.

Anyway, some pointers if you'd be ready to consider other frameworks than SQLAlchemy:

The DivMod guys have been doing some tentative work on twisted - database integration based on the Storm ORM (google for "storm orm").

See this link for an example:

http://divmod.org/trac/wiki/NevowStormApproach

Also, head over to DivMod's site and have a look at the sources of their Axiom db layer (probably not of any use to you directly since it's Sqlite only, but it's principles might be useful).

Jacob Oscarson
Thanks Jacob - I tried Storm first last fall before opting for SQL Alchemy. Inserting data was unacceptably slow with Storm at that time. Maybe that's no longer an issue now - was taking like 2 to 3 secs per insert on MySQL for one of our tables. I've also used runInteration and runQuery in twisted with their connection pool stuff and that works great. The thing is I want it to work in the Pylons world as well. So next thing to look at is [Eventlet](http://eventlet.net/) and [gevent](http://www.gevent.org/) The latter is supposed to be faster / better but no go with Twisted.
Khorkrak
+1  A: 

There's a storm branch that you can use with twisted directly (internally it does the defer to thread stuff) on launchpad https://code.launchpad.net/~therve/storm/twisted-integration. I've used it nicely.

Sadly sqlalchemy is significantly more complex in implementation to audit for async usage. If you really want to use it, i'd recommend an out of process approach with a storage rpc layer.

alternatively if your feeling adventurous and using postgresql, the latest pyscopg2 supports true async usage (https://launchpad.net/txpostgres), and the storm source is pretty simple to hack on ;-)

incidentally the storm you tried last year may not have had the C-extension on by default (it is now in the latest releases.) which might account for your speed issues.

kapilt
Good answer kaplit - I tried out using the Twisted related storm branch last year as well and it does the job - but we ended up moving to SQL Alchemy then because of the slow inserts. So yeah we're moving towards generally having a separate daemons that support some service based interface - simply xmlrpc presently accepting simple args and returning primtives like integers, string or dicts, lists etc. In some cases yaml for marshalling objects.
Khorkrak