views:

102

answers:

4

I've been tinkering with SQLite3 for the past couple days, and it seems like a decent database, but I'm wondering about its uses for serialization.

I need to serialize a set of key/value pairs which are linked to another table, and this is the way I've been doing this so far.

First there will be the item table:

CREATE TABLE items (id INTEGER PRIMARY KEY, details);

 | id | details |
-+----+---------+
 |  1 | 'test'  |
-+----+---------+
 |  2 | 'hello' |
-+----+---------+
 |  3 | 'abc'   |
-+----+---------+

Then there will a table for each item:

CREATE TABLE itemkv## (key TEXT, value);  -- where ## is an 'id' field in TABLE items

 | key | value |
-+-----+-------+
 |'abc'|'hello'|
-+-----+-------+
 |'def'|'world'|
-+-----+-------+
 |'ghi'| 90001 |
-+-----+-------+

This was working okay until I noticed that there was a one kilobyte overhead for each table. If I was only dealing with a handful of items, this would be acceptable, but I need a system that can scale.

Admittedly, this is the first time I've ever used anything related to SQL, so perhaps I don't know what a table is supposed to be used for, but I couldn't find any concept of a "sub-table" or "struct" data type. Theoretically, I could convert the key/value pairs into a string like so, "abc|hello\ndef|world\nghi|90001" and store that in a column, but it makes me wonder if that defeats the purpose of using a database in the first place, if I'm going to the trouble of converting my structures to something that could be as easily stored in a flat file.

I welcome any suggestions anybody has, including suggestions of a different library better suited to serialization purposes of this type.

+2  A: 

You might try PRAGMA page_size = 512; prior to creating the db, or prior to creating the first table, or prior to executing a VACUUM statement. (The manual is a bit contradictory and it also depends on the sqlite3 version.)

I think it's also kind of rare to create tables dynamically at a high rate. It's good that you are normalizing your schema, but it's OK for columns to depend on a primary key and, while repeating groups are a sign of lower normalization level, it's normal for foreign keys to repeat in a reasonable schema. That is, I think there is a good possibility that you need only one table of key/value pairs, with a column that identifies client instance.

Keep in mind that flat files have allocation unit overhead as well. Watch what happens when I create a one byte file:

$ cat > /tmp/one

$ ls -l /tmp/one
-rw-r--r-- 1 ross ross 1 2009-10-11 13:18 /tmp/one
$ du -h /tmp/one
4.0K    /tmp/one
$

According to ls(1) it's one byte, according to du(1) it's 4K.

DigitalRoss
A: 

Why not just store a foreign key to the items table?

Create Table ItemsVK (ID integer primary key, ItemID integer, Key Text, value Text)
feihtthief
A: 

If it's just serialization, i.e. one-shot save to disk and then one-shot restore from disk, you could use JSON (list of recommend C++ libraries).

Just serialize a datastructure:

[
  {'id':1,'details':'test','items':{'abc':'hello','def':'world','ghi':'90001'}},
  ...
]

If you want to save some bytes, you can omit the id, details, and items keys and save a list instead: (in case that's a bottleneck):

[
  [1,'test', {'abc':'hello','def':'world','ghi':'90001'}],
  ...
]
orip
+1  A: 

Don't make a table per item. That's just wrong. Similar to writing a class per item in your program. Make one table for all items, or perhaps, store the common parts of all items, with other tables referencing it with auxillary information. Do yourself a favor and read up on database normalization rules.

In general, the tables in your database should be fixed, in the same way that the classes in your C++ program are fixed.

jalf