ansaurus

Question

Is it a good idea to create a custom type for the primary key of each data table?

Answer 1

+2 A:

I wouldn't make a special id for this. This is mostly a testing issue. You can test the code and make sure it does what it is supposed to.

You can create a standard way of doing things in your system than help future maintenance (similar to what you mention) by passing in the whole object to be manipulated. Of course, if you named your parameter (int personID) and had documentation then any non malicious programmer should be able to use the code effectively when calling that method. Passing a whole object will do that type matching that you are looking for and that should be enough of a standardized way.

I just see having a special structure made to guard against this as adding more work for little benefit. Even if you did this, someone could come along and find a convenient way to make a 'helper' method and bypass whatever structure you put in place anyway so it really isn't a guarantee.

Arthur Thomas 2010-01-11 17:17:29

Answer 2

+4 A:

It's hard to see how it could be worth it: I recommend doing it only as a last resort and only if people are actually mixing identifiers during development or reporting difficulty keeping them straight.

In web applications in particular it won't even offer the safety you're hoping for: typically you'll be converting strings into integers anyway. There are just too many cases where you'll find yourself writing silly code like this:

int personId;
if (Int32.TryParse(Request["personId"], out personId)) { 
    this.person = this.PersonRepository.Get(new PersonId(personId));
}

Dealing with complex state in memory certainly improves the case for strongly-typed IDs, but I think Arthur's idea is even better: to avoid confusion, demand an entity instance instead of an identifier. In some situations, performance and memory considerations could make that impractical, but even those should be rare enough that code review would be just as effective without the negative side-effects (quite the reverse!).

I've worked on a system that did this, and it didn't really provide any value. We didn't have ambiguities like the ones you're describing, and in terms of future-proofing, it made it slightly harder to implement new features without any payoff. (No ID's data type changed in two years, at any rate - it's could certainly happen at some point, but as far as I know, the return on investment for that is currently negative.)

Jeff Sternal 2010-01-11 17:26:40

this is not a web applicaion and a lot of complex state in kept in memory including the IDs. However I have seen the above a lot in web apps, hence +1

Ian Ringrose 2010-01-11 17:28:34

The explicit casting required for the web scenario you give isn't really a downside, it forces you to document/think about the type of the parameter at its input boundary.

Frank Schwieterman 2010-01-11 18:52:21

@Frank - in the case I've described you already have 1) a request key and 2) a local variable name, both of which describe the parameter's content at the input boundary. I just don't see what value the strongly-typed identifier class adds in that scenario.

Jeff Sternal 2010-01-11 20:08:39

Answer 3

+1 A:

I don't see much value in custom checking in this case. You might want to beef up your testing suite to check that two things are happening:

Your data access code always works as you expect (i.e., you aren't loading inconsistent Key information into your classes and getting misuse because of that).
That your "round trip" code is working as expected (i.e., that loading a record, making a change and saving it back isn't somehow corrupting your business logic objects).

Having a data access (and business logic) layer you can trust is crucial to being able to address the bigger pictures problems you will encounter attempting to implement the actual business requirements. If your data layer is unreliable you will be spending a lot of effort tracking (or worse, working around) problems at that level that surface when you put load on the subsystem.

If instead your data access code is robust in the face of incorrect usage (what your test suite should be proving to you) then you can relax a bit on the higher levels and trust they will throw exceptions (or however you are dealing with it) when abused.

The reason you hear people suggesting an ORM is that many of these issues are dealt with in a reliable way by such tools. If your implementation is far enough along that such a switch would be painful, just keep in mind that your low level data access layer needs to be as robust as an good ORM if you really want to be able to trust (and thus forget about to a certain extent) your data access.

Instead of custom validation, your testing suite could inject code (via dependency injection) that does robust tests of your Keys (hitting the database to verify each change) as the tests run and that injects production code that omits or restricts such tests for performance reasons. Your data layer will throw errors on failed keys (if you have your foreign keys set up correctly there) so you should also be able to handle those exceptions.

Godeke 2010-01-11 17:29:24

If only..., most of the current code just does a catch, then logs the exception, and then returns some default result to the next layer up. (After all it is more important to make the customer think this system is working ok, then to make the system work ok.)

Ian Ringrose 2010-01-11 17:41:40

Ouch. On the upside, you have logging in place that can be used to build such tests to prevent future recurrence of such errors (I'm going to guess that such tests aren't commonly built now?) Testing database code can be straightforward if you have the ability to set your test DB in a known state prior to each test run. We use a "poor man's" solution of capturing our starting state and then running *all* test code in transactions that rollback (success *or* failure). This way each test gets the pristine database to work with (longer tests have an outer TRANSACTION).

Godeke 2010-01-11 19:12:11

Answer 4

+1 A:

My gut says this just isn't worth the hassle. My first question to you would be whether you actually have found bugs where the wrong int was being passed (a Car ID instead of a Person ID in your example). If so, it is probably more of a case of worse overall architecture in that your Domain objects have too much coupling, and are passing too many arguments around in method parameters rather than acting on internal variables.

Nick 2010-01-11 17:30:57

Answer 5

+3 A:

You can just opt for GUIDs, like you suggested yourself. Then, you won't have to worry about passing a person ID of "42" to DeleteCar() and accidentally delete the car with ID of 42. GUIDs are unique; if you pass a person GUID to DeleteCar in your code because of a programming typo, that GUID will not be a PK of any car in the database.

HardCode 2010-01-11 17:32:46

To me looks the best answer of all available.

SoMoS 2010-01-11 17:52:13

It's a good solution to this particular problem, and could be appropriate, but deciding whether to use GUIDs or ints for a particular primary key depends on a lot of other factors. A quick SO search on "GUID Int Primary Key" offers several good briefs on them.

Jeff Sternal 2010-01-11 18:02:24

Answer 6

A:

In response to the idiot who wrote "It's hard to see how it could be worth it:" :

There has been a topic here on SO entitled "how to find GUID in database". The topic was precisely about (a bunch of other idiots who had managed to create) the very problem that the OP here intends to avoid : that values that are intended to be an orderID, instead get to be stored in a customerID column in the customer table.

Needless to say that the idiots who actually did this, did not deem it useful to use their DBMS's feature of referential integrity. Needless to say that the idiots who actually did this, thought they could do RI "equally well" themselves, inside their application code.

And since they were such idiots, of course they would never have wanted to hear the wise advice of smarter people to not do that.

Needless too to say that the idiots who actually did this, ended up screaming for help on SO because they found themselves having to "discorrupt" their database, without any clue whatsoever how to do this.

So : the OP is asking a very wise and very relevant question, even if some or most of you lack the brain power to understand why this is so.

And my answer to the actual question :

In order to avoid the kind of trouble with ID columns that you seek to avoid, there are several options, in order of decreasing appropriateness :

(a) don't use them, (b) use RI everywhere, (c) your idea to use static type checking.

Erwin Smout 2010-01-11 18:49:47

-1 for the "idiots" in your answer

Smasher 2010-01-11 20:14:27

Yes, I'd be inclined to give such idiots a -1 too ...

Erwin Smout 2010-01-11 20:29:22

I would say use referential integrity everywhere is the first step regardless; the database does have good referential integrity setup in it. Then it comes down to cost/benefit trade off.

Ian Ringrose 2010-01-11 21:49:35

Answer 7

+2 A:

You could create a simple Id class which can help differentiate in code between the two:

public class Id<T>
{
    private int RawValue
    {
        get;
        set;
    }

    public Id(int value)
    {
        this.RawValue = value;
    }

    public static explicit operator int (Id<T> id) { return id.RawValue; }

    // this cast is optional and can be excluded for further strictness
    public static implicit operator Id<T> (int value) { return new Id(value); }
}

Used like so:

class SomeClass
{
     public Id<Person> PersonId { get; set; }
     public Id<Car> CarId { get; set; }
}

Assuming your values would only be retrieved from the database, unless you explicitly cast the value to an integer, it is not possible to use the two in each other's place.

sixlettervariables 2010-01-11 18:57:35

I pursued doing something like this with Linq-to-SQL, unfortunately it would not let me use my custom type with the ID column. So you may have issues with other ORM layers you may have. In the end I did not do it, but I think its a reasonable thing to do. It would be unusual though.

Frank Schwieterman 2010-01-11 18:58:27

I think you have the implicit/explicit mixed up though. The Id<> type should be implicitly convertable to an int, while an int should only be explicitly convertable to an Id<>.

Frank Schwieterman 2010-01-11 18:59:45

I prefer it this way as casts away from the type are easily visible. I'd prefer a database ID never be seen by the end user as an int. To each their own. We have an ORM system in place that uses these Id classes (well, something similar) with good results.

sixlettervariables 2010-01-11 19:56:04

ansaurus

tags:

views:

answers:

Is it a good idea to create a custom type for the primary key of each data table?

related questions