views:

1113

answers:

8

I need to store JSON strings against string keys (that should be like 255 characters long). Only querying by key is necessary, so basically just a persistent hash table. It doesn't need to be very safe to write to, but should be kind of fast (few thousand writes per second on single box). If two processes happen to try to write the same key at the same time, I don't really care if the data gets mushed, so no locking is necessary. Should be easy to use. What do you suggest?

A: 

That's trivial. Example for mysql:

CREATE TABLE `storage` (`key` VARCHAR(255) PRIMARY KEY, `value` VARCHAR(255));

This is the table you need, the key column has an index and is unique. You can't get faster than that without changing the column types or dropping the unique index (which would result in duplicate rows).

soulmerge
I wish I could do that, but it doesn't seem to be fast enough. I tried a table like what you suggest, tested both MyISAM and InnoDB tables and putting 10000 things to the database always took around 5 seconds, even if I keep the connection open.
Bemmu
I don't see any problems. Took 940ms to insert 10k key/value pairs on my first-generation macbook (key and value were both random md5 hashes).
soulmerge
+4  A: 

MemcacheDB. As for "unsafe", you can turn of it's ACIDity (particularly Durability), gaining even more performance.

‘-N’ enable DB_TXN_NOSYNC to gain big performance improved, default is off. By using ’-N’ option, ’ACID’ in transaction will lose ’D’. The data in transaction log buffer may be gone when the machine loses power

vartec
+1  A: 

Two possibilities - APC or memcache (and memcache in PECL). I'm not sure of the APC key-length limits, but memcache's is 255. You can hash the key to get something that is shorter though.

Neither are persistent (APC is cleared on a restart, as is memcache - though I've had memcached servers up for weeks, even months at a time). It may or may not be good enough for you though. If you truly need long term storage, you'll put it into a real database anyway.

Alister Bulman
that's not persistent.
vartec
The question doesn't ask for persistent. or safe, just fast.
Alister Bulman
It states: «so basically just a persistent hash table»
vartec
+1  A: 

Tokyo Cabinet/Tokyo Tyrant is apparently fast and efficient... But it doesn't come with PHP bindings by default.

AKX
There is a very basic PHP binding available at http://code.google.com/p/phptyrant
Tamas Kalman
A: 

"Few thousand writes per seconds" seems quite a lot, so you may have to build a custom solution like writing to memory (SHM) and periodically flushing to some permanent storage.

255 length keys looks hughe. In any case consider to maintain a good distribotion to make indexing easier.

For me, a three level aproach seems OK:

  • 1st level: writing to shm unindexed and linear search

  • 2nd level: (do in quite frequently) flush the shm to disk, make an index (or at least sort the keys and do a qsearch)

  • 3rd level: merge and clear the 2nd level data and indexes into bigger and more effectively searchable data. This could be any high performance ready built system, a MySQL HA or any other database solution with HA/clustering.

The I'd analyze the data sizes and may fork for the 3 level of searches (analysis is necessary to calculate the optimal sizes for the search times. It should be in the same magnitude for the 3 levels) and then merge the results.

To wrap it up: if you plan to this traffic, you absolutely shoud design the app to scale very well over more than one boxes.

Csaba Kétszeri
Re 2nd level: I think you mean "disk," not "disc" :)
jhs
Yes, of course.I'll correct it.
Csaba Kétszeri
+1  A: 

@soulmerge Well MySQL is certainly unsafe and easy - like several people I know.

+1  A: 

Could you use the OS filesystem? If your keys don't have an even distribution then md5 hash them, and create subfolders one or two layers deep.

Example, md5("some key") is cd52951f7c0dc0850f2c6c9cb7be9474, so you would put the data for that key in /path/to/my/storage/cd/52/951f7c0dc0850f2c6c9cb7be9474. You've got everything you need built into PHP, and you can scale out by using RAID 0; and if you need network access then you could just use NFS.

(Just thinking out loud.)

jhs
A: 

how about using redis ( code.google.com/p/redis )

or even better use erlang ... try this ( http://obvioushints.blogspot.com/2010/03/erlang-as-fast-key-value-store-for-php.html ) .. mysql was 10k .. this is 150k

already using erlag-php bridge for a project..n its fast ... n since its erlang .. i'm guessing scaling up shouldn't be a pain

cheers

NetRoY