views:

90

answers:

4

I want to make a composite key (well that's the idea but I'm open to other suggestions) on a Documents table. It would consist of two columns, year (2010,...) and an ID, which would be autoincrementing but it should restart itself every year.

So keys like these 2010-1, 2010-2, ..., 2011-1, 2011-2, ... and, preferrably those keys should also be used for displaying, printing and searching.

Though I don't believe autoincrementing will work, because of the reset every year, so I guess I'll have to make increments myself, won't I?

Or should I just make a varchar column and construct each key myself and just put a unique on that column?

So, what are my options?

Please also take in consideration future design issues that I might have with a chosen design whatever would it be and ease of querying.

UPDATE:

I'm really starting to look into letting the application construct the key and provide it when inserting. However, it would require looking into DB for the last issued ID, which could result in problems in high volume usage.

+1  A: 

Though I don't believe auto-incrementing will work, because of the reset every year, so I guess I'll have to make increments myself, won't I?

Yep.

I recommending adding a column in order to generate the resetting value. An IDENTITY column would be best; a DATETIME could hold the record creation time but transactions within 3.33 milliseconds (0. 00333 seconds) of each other would have the same timestamp.

Either way, you could generate the id value using:

SELECT (SELECT COUNT(*)
          FROM DOCUMENTS t
         WHERE t.year = d.year
           AND t.col <= d.col) AS id,
        d.year
   FROM DOCUMENTS d

Or if you're on SQL Server 2005+, you could use:

 SELECT ROW_NUMBER() OVER (PARTITION BY d.year ORDER BY d.col) AS id,
        d.year
   FROM DOCUMENTS d
OMG Ponies
Minor quibble: based on his example: `SELECT COUNT(*) + 1` since he seems to want the first one loaded to be `CCYY-1`.
AllenG
@AllenG: The `<=` should ensure that 1 is returned.
OMG Ponies
I can't agree with this solution, if you mean that the PK in the table would involve the date and no ID column. In a system that has sufficiently high transaction volume, you can (and do!) often get two documents that get created at the same time.
Dave Markle
@Dave Markle: I do not state to use the created DATETIME column as the primary key. DATETIME has a precision of 3.33 milliseconds (0. 00333 seconds). This is not sufficient in some circumstances, maybe the OP can shed light on their requirements...
OMG Ponies
The volume might get high, but I believe it won't get that high that this precision would not suffice.
mare
A high volume is not necessary for race conditions, it simply makes them more likely. Using a DATETIME field as a unique identifier will inevitably lead to race conditions regardless of how much or little volume the database is handling.
jball
@jball: Only if you're letting the client supply the DATETIME value, which is exactly why you should not.
OMG Ponies
@OMG Ponies, the chance of two inserts happening at the same time may be low, but over the life of the application, it should be assumed that it will happen repeatedly. How often have "single-user" applications been coerced into multi-user service? To assume that it won't happen is wishful thinking.
jball
OMG Ponies
@OMG Ponies you're claiming that `BEGIN TRANSACTION TestDate1 - GO - INSERT INTO SomeTable (SomeDateColumn) VALUES SELECT SYSDATETIME(); - GO - COMMIT TRANSACTION TestDate1 - BEGIN TRANSACTION - BEGIN TRANSACTION TestDate2 - GO - INSERT INTO SomeTable (SomeDateColumn) VALUES SELECT SYSDATETIME(); - GO - COMMIT TRANSACTION TestDate2` would never lead to the same DATETIME value in both records?
jball
(`-` are meant to represent newlines.)
jball
@OMG Ponies: You seriously need to test the assertion that it's uncommon that DATETIMES rarely collide -- you'll quickly find out that it's not correct. While it's true that every transaction has its own Log Sequence Number (LSN) and that number is ever-increasing, it's absolutely not the case that you won't see two operations happen coincidently because of DATETIME's resolution. It happens *far* more often than most people expect, and you should absolutely design for it in all of your database apps.
Dave Markle
@jball: Per [SYSDATETIME documentation (SQL Server 2008+ btw)](http://msdn.microsoft.com/en-us/library/bb630353.aspx), there's more fractional second precision - it's precise to the *nanosecond*. That's because it's DATETIME2, not DATETIME. Regardless, you can check transaction logs - one will occur before the other. The issue here is if the data type has enough precision to make that visible.
OMG Ponies
@OMG Ponies, you're on a non-sequitar now. The fact that SYSDATETIME offers 100 ns precision (not accuracy, that depends on the system it's running on, and has a resolution of 15 *ms* http://msdn.microsoft.com/en-us/library/ms724397(VS.85).aspx) does not change my question to you. Sure, the transactions may execute in sequence and not in parallel, but with the resolution that even DATETIME2 offers, you're asking for a race condition.
jball
@Jball: I'm not sure what you think is non-sequitur about "The issue here is if the data type has enough precision/accuracy to make that visible.", because you end off agreeing with it. You're asking for the race condition if you let clients provide the value - that possibility is substantially smaller when handled by the database and only because of the data type, not because the events appear to be simultaneous.
OMG Ponies
So if I let the client select document date (this IS a requirement in my case, because clients should be able to select a date before or later than today, right now), I am looking for troubles? Can I let the client choose the date and then recreate it somehow in my code (.NET 4) based on what the client provided (we would essentially just need to change the time part, if I get this right) to prevent same dates?
mare
What I'm trying to say is, I am used to using modern data access mechanisms in my projects like Linq to SQL or EF. And I provide the data to those mechanisms through Business objects or directly into the methods (very simple example of this would be calling DateTime.Now and providing this to the DAL in .NET code). The DAL then takes this and puts it into SQL Server (and yes, you can assume we are on SQL 2008+).
mare
@mare: A client selecting a date is unlikely to have a time portion (forcing them to enter this is just asking for bad data), making it impossible to determine which record comes before the other when the same date is used (because the time portion will always be midnight of that day). Manipulating the value in the application is not the database - either the SQL submitted to the database uses SYSDATETIME in the query or you are looking at validating user input against existing values (via db trip prior to insertion, or error from db constraint(s)).
OMG Ponies
@mare: Please provide more information about this user submitted date - the question only mentions the year. Dates can't be in the future (no one can postdate a record), but this supports backdating (potentially shifting the ordering as new data is entered).
OMG Ponies
Sorry, my bad, the user selected date has nothing with the autogenerated date which would be used in the key and would be generated with DateTime.Now (regardless of what user chooses).
mare
@OMG Ponies, the question of precision/accuracy of the column datatype is a non-sequitur because the database does not guarantee that the value of that field will be unique for all the records in that table. Uniqueness is the issue, accuracy and precision are non-sequitur.
jball
OMG Ponies
@OMG Ponies - IFF you had a perfect precision datetime field in the table, a perfect precision and accuracy datetime source, and could guarantee that the database will never execute two statements in a single quant of time, then your latest comment would be reasonable. However, SQL Server fails to provide any of these three conditions. It seems likely to me that you have not run the SQL scripts I provided yesterday. Fundamental science is based on empirical testing.
jball
P.s. Use http://en.wikipedia.org/wiki/Non_sequitur_(logic) vice http://en.wikipedia.org/wiki/Non_sequitur
jball
+1  A: 

Why not add an actual auto-incrementing id? Composite keys can quickly grow to where they're virtually useless - especially for performance reasons if you ever have to join on the table. If you then ALSO want to log that Document XYZ was the first document stored in 2010, you could still have your Year and Order (or whatever) columns, but your primary key stays nice and clean.

AllenG
I don't quite understand this. Autoincrement will never go back to 1. Of course I can store Year separetely and use "normal" autoincrement ID for primary key but with this solution I won't get 2011-1 next year, it will be 2011-437 (or something).
mare
If you use your year + [int] as a "key" they cannot be null, meaning you have to assign the int when the record is inserted which requires jumping through hoops. It also leaves you with a composite key- which can be bad for a number of reasons. Using a set primary key independent of your year + [int], leaves you free to use an On Insert trigger (or some such) to set your int value. It _also_ keeps your actual primary key clean.
AllenG
Ok, I see, so what you suggest is similar to what @igor suggested. He did worded it better though but thanks!;)
mare
+1  A: 

If you're going to the trouble of creating an auto-incrementing key, I would throw out the idea of resetting it on every year, and just use an IDENTITY INT column instead.

If you want to get the sequence number of the document within the year, there are SQL functions you can use to do that:

ROW_NUMBER() OVER (PARTITION BY ... ORDER BY...)

Dave Markle
That would complicate the matters worse. I want to avoid SQL functions at DB level, if possible, and would like to have, again -if possible - one to one relation between what is stored in DB and what is used in application (read, UI).
mare
+3  A: 

It is a good practice to separate business with data storage structure. Why? Because tomorrow somebody will decide to change business logic:

  • use non-integer document order (1-AA, 1-AB, 1-AC...)
  • include year and month to build some monthly reports
  • any other changes...

And what should you do than?

So, my solution is:

  • use primary key (int, for example or your preferred data type) to do relationship with other tables in the database
  • use business key 1.2.3...as you wish (maybe some identifier generator)
  • use datetime field to store date of adding document, the year you can calculate dynamically.
igor
So far this seems the best option.
mare