tags:

views:

56

answers:

2

Hi all, I have a following design. There's a pool of identical worker processes (max 64 of them, on average 15) that uses a shared database for reading only. The database is about 25 MB. Currently, it's implemented as a MySQL database, and all the workers connect to it. This works for now, but I'd like to:

  • eliminate cross-process data transfer - i. e. execute SQL in-process
  • keep the data completely in memory at all time (I mean, 25 MB!)
  • not load said 25 MB separately into each process (i. e. keep it in shared memory somehow)

Since it's all reading, concurrent access issues are nonexistent, and locking is not necessary. Data refreshes happen from time to time, but these are unfrequent and I'm willing to shut down the whole shebang for those.

Access is performed via pretty vanilla SQL SELECTs. No subqueries, no joins. LIKE conditions are the fanciest feature ever used. Indices, however, are very much needed.

Question - can anyone think of a database library that would provide the goals outlined above?

+1  A: 

You can use SQLite with its in-memory database.

Ivo
But that creates separate, per-process instances. See requirement # 3. AFAIK, you cannot create a SQLite in-memory database over a given chunk of memory.
Seva Alekseyev
The OS's buffer cache will keep the entire database in memory, and that is shared between all processes.
MarkR
When you fork, a copy of the in-memory database is created, containing whatever was in it when you did the fork. If you use threads, this doesn't happen, all threads will use the same database, and writes from another thread are visible as well.
Ivo
Threads are, unfortunately, not an option on this project - a 3rd party library that's absolutely essential to the project hates them with a passion. However, I do vaguely recall that fork() on Linux does not copy memory, it creates shared memory with copy on write. Gotta investigate...
Seva Alekseyev
A: 

I would look at treating like a cache. MEMCACHED is easy and very fast as all in memory. Fan of MongoDB or similar will also be faster although disk based.

Simon Thompson
I know not in process but speed improvement might be enough ?
Simon Thompson
Memcached does not do SQL, does it? I'll take a look at MongoDB.
Seva Alekseyev
Memcached does not do SQL but depends on what your looking to do as denormalising your data and storing as formatted records/answer against keys might be the answer. Otherwise the whole nosql lot is worth considering mongodb , couchedb, etc they all have SQL like query abilities.
Simon Thompson
Do they have more than one index per table? What about composite indices?
Seva Alekseyev
Memcached doesn't cache the database, it consumes extra memory to cache things which are "hard" to make. Typically it's the wrong solution unless you have an infrastructure with lots of machines.
MarkR
Markr my suggestion was to think about using cache tech instead of direct db access it does not cache a db but could be pre loaded with data. We do this for web clusters to speed up etc
Simon Thompson
Nosql do support multiple inexies but they require you to think about the problem from slightly diff angle.
Simon Thompson