tags:

views:

67

answers:

3

I'm designing an app with a rather complex table design using the Postgres database, and am stuck on one point that I was hoping someone could offer advice about.

I have several tables, each of which has a feature id (or fid). Different types of entities have different attribute schemas, so I have to create a different table for each type. However, I want to make sure that the fids are unique across all entity types.

If I have three entity types, Entity1/2/3, represented by the following 3 tables:

Entity1             Entity2             Entity3
    fid                 fid                 fid
    attribute1          attribute2          attribute3

How do I ensure that there are no duplicate fids anywhere in the system?

Thanks!

+4  A: 

PostgreSQL (and Oracle for that matter) use objects called Sequences for sequencial value generation. Unlike MySQL and SQL Server's approach to sequencial value generation, sequences are not tied to the table. So you can define a single sequence (documentation link):

CREATE SEQUENCE your_seq 

This will create a sequence called your_seq, which will start at 1 and increment by 1 every time the next value is retrieved - see the documentation link if you want to set the minimum value and increment value/etc differently.

To use it, any INSERT statement needs to include:

NEXTVAL('your_seq')

...in the position in the INSERT statement to populate the appropriate column. IE:

INSERT INTO entity1
  (fid)
VALUES
  (NEXTVAL('your_seq'))

And, to automate this a little more, you would set NEXTVAL('your_seq') as the default value on all the tables that use it.

OMG Ponies
@Edmund: Thx, excellent point.
OMG Ponies
This can be made a little easier by creating a base table that has any columns that would be common throughout your Entity tables, and including a SERIAL field (which will automatically create and use a sequence). When you go to create a new Entity table, INHERIT from your base table and the SERIAL id field will match. You will, however, need to add any constraints yourself to each table.
MkV
A sequence absolutely would work if it weren't for the pesky users who occasionally insist on writing data to the tables directly from postgres. I need to ensure that if/when they do that, they are not corrupting the data. If it were only the app writing data, that would work great.As for the serial field, I'll look into that. One of the clients expressed concern about creating "gaps" in the fids, which might occur if someone inserted a large number of temporary records then deleted them. Ideally, we'd retain the freedom to use any fids we wanted, so long as they were unique.
Watusimoto
@Watusimoto: See the note about setting the default constraint. If your clients are concerned about gaps in the `fid` column - it shouldn't be used for referencial integrity.
OMG Ponies
A: 

I suggest using guids for your fid field. This way you can be sure that you won't have duplicate fids, and is more elegant than either (i) maintaining a 'highest fid' somewhere and querying it every time you do an insert operation, or (ii) putting code in that checks all your fids in all your tables every time you do an insert operation.

sasfrog
Kimberley Tripp (SQLSkills.com) has [an article](http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx) on using GUID's as primary keys. She advices against it because over the unnecessary overhead. Jeff Atwood also [wrote this, siding against using GUIDs](http://www.codinghorror.com/blog/2007/03/primary-keys-ids-versus-guids.html)
OMG Ponies
@OMG Ponies: thanks for these points, I'll read them with interest. The SEQUENCE capability (which I wasn't aware of) looks ideal for this situation. +1
sasfrog
After reading the article from Jeff Atwood, I'm not sure I agree with you that he is siding against GUIDs...
sasfrog
A: 

It sounds like Watsuimoto and MkV are thinking something similar - have a base table hold FIDs and some specific entity tables inherit from that. Watsuimoto mentioned it wasn't working... if you can get it to work, then I agree with you both that it's the right solution. And maybe simulating it with FKs on the integer IDs isn't tight enough to help Watusimoto's customers from assigning two Entities to the same EntityBase.

A potential fix for that would be to have a compound key with the entity's type to help identify it. One example:

EntityTypes     EntityBase     Entity1      Entity2
-------------   ------------   ---------    ---------
TypeName (PK)   EntityID       ID           ID
                EntityType     EntityType   EntityType
                CommonAtts     Attribute1   Attribute2
                FID

Constraints:
-----------------------------------------------------------------
EntityBase:
        PK... lots of options.  Probably PK(EntityID, EntityType)
        UNIQUE(FID)
        FK(EntityType) on EntityTypes(TypeName)
Entity1   :
        PK(ID)  (or PK(ID, EntityID))
        EntityType NOT NULL
        CHECK(EntityType = "Entity1")  (e.g., it is constant)
        FK(EntityType) on EntityTypes(TypeName)
        FK(ID, EntityType) on EntityBase (ID, EntityType)
Entity2   :  <Ditto>

You've got a lot of flexibility here. You could set up per-type FIDs. You could make EntityIDs unique per type or unique across all Entities. You could make EntityBase have an ID separate from EntityID. You might even be able to make EntityType some kind of computed column, or default it, so you don't have to write a value to it.

If that's not your cup of tea because of the EntityType overhead, then I reluctantly offer this:

Entity1      Entity2           Features
---------    ---------   ...   ----------
ID (PK)      ID (PK)           FID (PK, arbitrary)
Attribute1   Attribute2        Entity1ID (FK)
                               Entity2ID (FK)
                               Entity3ID (FK)

Constraints:
-----------------------------------------------------------------
Features  :
        One and only one EntityID is NOT NULL

People who like to think of tables as "entity records" usually don't like this approach. It's unwieldy for more than a half dozen entities. But it is correct, and does allow you to keep your single-integer row IDs if you want.

When it comes to issues like this, I go to Ken Downs blog and look around. He's got some pretty good thoughts on relational design. That would be my first suggestion if I could find the article he posted on this topic. This article is the closest I could find.

Matt
I like the compound key idea. It's pretty simple, and it solves all the issues nicely.
Watusimoto
Cool. Good luck with the design!
Matt