views:

584

answers:

4

I am being required to use a postgreSQL database and it will replace my current use of berkeleyDB. Although; I realize this is not an ideal situation, it is beyond my control.

So the question is... If you were required to make postgreSQL into a key value store how would you go about doing this, while making it as efficient as possible?

My values are byte arrays and my key's are strings, I could impose some restrictions on the lengths of these strings.

I assume I should use a blob for my value and primary key column holding the key, but as I am just venturing into this journey I am curious if anyone in the stack overflow community has done this, or if there are any specific 'gotchas' I should look out for.

A: 

What do you need to store as a value ? Strings ? Ints ? Objects (e.g. serialized Java objects). A simple implementation would work with a 3 column table looking like:

NAME(VARCHAR)   TYPE(VARCHAR)   VALUE(VARCHAR)

(perhaps the TYPE is some enumeration). The above wouldn't work for binary data like serialised objects, though and perhaps you need a BLOB there.

Alternatively (and probably a much better idea), have you seen Apache Commons Configuration ? You can back that with a database (via JDBC) and you can store properties such that you retrieve them thus:

// get a property called 'number'
Double double = config.getDouble("number");
Integer integer = config.getInteger("number");

That may save you a lot of grief in terms of implementation. You may have a problem with saving binary data, in that you'd have to serialise it prior to insertion and post-retrieval. But I've used this in the past for storing ints,doubles and serialised Java objects via XStream, so I can confirm it works well.

Brian Agnew
A: 

It really should be dependant on what the key will be. If it will always be a string under 255 characters, then use a Varchar as yoru PK and then use a blob (assuming a large value) for the value. if it will always be a number, use int, etc.

In other words, need more info to really give you a good answer :)

cyberconte
+2  A: 

If you are forced to use relational database, I would suggest to try to find structure in your data to take advantage of the fact, since you forgo the advantage of speed you got with unstructured data and key-value store. The more structure you find, the better advantage you get out of your predicament. Even if you only find structure in the keys.

Also consider if you will only need sequential or random access to your data and in which ratio and structure your database by this requirement. Are you going to do queries on your values by type for example? Each of those questions could have effect on how you structure your database.

One specific consideration about blobs in postgresql they are internally represented as pg_largetable (loid:oid,pageno:int4,data:bytea). The size of the chunks is defined by LOBBLKSIZE, but typically 2k. So if you can use byte arrays in your table instead of blobs and limit size of your value/key pair under blocksize, you can avoid this indirection through second table. You could also increase the block size if you have access to configuration of the database.

I'd suggest to go looking for structure in data and patterns in data access and then ask your question again with more detail.

Jiri Klouda
+4  A: 

I am not sure if I understand your actual problem. If your actual problem is how to create a table for that and what field types to use, then you need to take a look in the Data Types chapter of the PostgreSQL documentation.

To create a table in PostgreSQL, you can go ahead with the CREATE TABLE statement.

For the String key you probably want to use varchar(n) datatype. First determine the maximum length based on the existing keys, choose a reasonable maximum length to use in the DB and document it so that future additions won't use a key longer than that. Let's assume that 100 is a reasonable maximum length.

For the byte[] value you need the bytea datatype. We will also assume that this is never null, so use the NOT NULL here.

Now, the final CREATE statement would look like:

CREATE TABLE tablename (
    key VARCHAR(100) PRIMARY KEY,
    value BYTEA NOT NULL
);

You can eventually add an autogenerated technical ID of serial datatype as primary key so that data management is more efficient. You only need to make the key NOT NULL and UNIQUE.

CREATE TABLE tablename (
    id SERIAL PRIMARY KEY,
    key VARCHAR(100) NOT NULL UNIQUE,
    value BYTEA NOT NULL
);

In the Java/JDBC side you could make use of ResultSet#getBytes(), or more efficiently ResultSet#getBinaryStream() to get the binary value associated with the key.

That said, storing binary data in a database put big question marks around the design. They are in no way searchable and if it involves character data, then you might need to take the character encoding precisely into account in the program which is used to insert and select the data.

BalusC