views:

419

answers:

3

I have a prefix trie. What is the recommended schema for representing this structure in a relational database? I need substring matching to remain efficient.

A: 

Check here for some insight and some links to sample code.

EDIT: Didn't see that the author edited the question to be about prefix trees. But for trees in general you can still read those links.

Strelok
+1  A: 

How about the Materialized Path design?

CREATE TABLE trie (
  path VARCHAR(<maxdepth>) PRIMARY KEY,
  ...other attributes of a tree node...
);

To store a word like "stackoverflow":

INSERT INTO trie (path) VALUES
  ('s'), ('st'), ('sta'), ('stac'), ('stack'),
  ('stacko'), ('stackov'), ('stackove'), ('stackover'),
  ('stackover'), ('stackoverf'), ('stackoverflo'),
  ('stackoverflow');

The materialized path in the tree is the prefixed sequence of characters itself. This also forms the primary key. The size of the varchar column is the maximum depth of trie you want to store.

I can't think of anything more simple and straightforward than that, and it preserves efficient string storage and searching.

Bill Karwin
A: 

Does any of your entities have a relationship with any other? If not, that is, not relational, a hash table with a serialization would do it.

yogman
That's really not a trie... I agree that it probably makes sense to keep it out of the db, but turning it into a hashtable?
dgtized
O(1) algorithms are not good at saving resources. ;)
yogman