views:

147

answers:

2

When using a DBM database (e.g. Berkeley or GDBM), is it better to store data using fewer long strings or more short strings? I can easily structure my data either way. I'm looking for 'better' in the performance sense, but I'm interested in other implications as well.

A: 

I think this question is really hard to answer in a completely generic way. There are so many variables here, that you would really need to test some common scenarios to determine the answer that is best for you.

Some factors to consider:

  • Will larger strings require substring searches?
  • What kind of searches will you perform over the data?

In the end, its generally better to go with the approach that yields the most normalized schema. Optimization can start from there, and depending upon your db, there are probably better alternatives than restructuring the underlying schema purely for performance.

jsight
+1  A: 

If you will be frequently searching or modifying the data, a greater number of short strings will provide better performance.

i.e. You don't want to be searching for a substring of one of those long strings, or modifying some value in the middle of a string frequently.

pianoman