views:

1602

answers:

5

I'm working on an application that will gather data through HTTP from several places, cache the data locally and then serve it through HTTP.

So I was looking at the following. My application will first create several threads that will gather data at a specified interval and cache that data locally into a SQLite database.

Then in the main thread start a CherryPy application that will query that SQLite database and serve the data.

My problem is: how do I handle connections to the SQLite database from my threads and from the CherryPy application?

If I'd do a connection per thread to the database will I also be able to create/use an in memory database?

+2  A: 

Short answer: Don't use Sqlite3 in a threaded application.

Sqlite3 databases scale well for size, but rather terribly for concurrency. You will be plagued with "Database is locked" errors.

If you do, you will need a connection per thread, and you have to ensure that these connections clean up after themselves. This is traditionally handled using thread-local sessions, and is performed rather well (for example) using SQLAlchemy's ScopedSession. I would use this if I were you, even if you aren't using the SQLAlchemy ORM features.

Ali A
Agreed. Multiple concurrent writes to a Sqlite3 database are just asking for trouble. No problem with concurrent reads though...
Kamil Kisiel
+1  A: 

"...create several threads that will gather data at a specified interval and cache that data locally into a sqlite database. Then in the main thread start a CherryPy app that will query that sqlite db and serve the data."

Don't waste a lot of time on threads. The things you're describing are simply OS processes. Just start ordinary processes to do gathering and run Cherry Py.

You have no real use for concurrent threads in a single process for this. Gathering data at a specified interval -- when done with simple OS processes -- can be scheduled by the OS very simply. Cron, for example, does a great job of this.

A CherryPy App, also, is an OS process, not a single thread of some larger process.

Just use processes -- threads won't help you.

S.Lott
A: 

Depending on the application the DB could be a real overhead. If we are talking about volatile data, maybe you could skip the communication via DB completely and share the data between the data gathering process and the data serving process(es) via IPC. This is not an option if the data has to be persisted, of course.

paprika
A: 

Depending on the data rate sqlite could be exactly the correct way to do this. The entire database is locked for each write so you aren't going to scale to 1000s of simultaneous writes per second. But if you only have a few it is the safest way of assuring you don't overwrite each other.

Martin Beckett
A: 

You can use something like that.

dugres