tags:

views:

67

answers:

3

Alright, I have some data that I need to assign an int type identifier to dynamically.

For example, here's a sample record.

<Listing>
    <sportcode>AA</sportcode>
    <lgabbr>NL</lgabbr>
    <division>CENTRAL</division>
    ...
<Listing>

There is more data associated with this listing but these are the only fields that I can use to generate the identifier, and I have to use all of them.

How can I put these together to create a UNIQUE identifier?

The only idea I've had so far is to concat the the strings together and then replace each char with it's charcode. This works just fine when I'm encoding just the sportcode (AA = 6565), or sportcode and lgabbr (AANL = 65657876), but when I add division the ID becomes long and unwieldly (AANLCENTRAL = 6565787667697884826576).

Any suggestions on how to make this more consise while still being completely unique?

Edit
Several answers have been submitted that showed how to reverse the process. I think I should mention the process doesn't need to be reversible at all. Once the number is generated, that is the value that is used by the rest of the application.

Edit again I've decided to go a different way with my solution, although I got some good suggestions. Since the most efficent way to assign the identifiers is to pick them myself instead of generating them, I added an XML doc with the id defenitions and they're corresponding data. Now the XSLT stylesheet I'm using to create the identifiers can do a lookup in the attached sheet and use the identifiers I assigned. I didn't want to custom define everything because there are so many posibilities. There are roughly 25 sportcodes, 10/15 lgabbr, and 10 divisions possible. That's a lot to account for in a stylesheet. Adding the extra XML document consolidates this data outside the stylesheet for easy editing and that is the approach I'm going with (and it works BTW).

Thanks for all your suggestions.

A: 

use SportCode as an integer starting at 0 (like an enum) to whatever use lgabbr as an integer starting at 0, etc. so...

NOSPORT == 0;
AA == 1;

NOLGABBR == 0;
NL == 1;

NODIV == 0;
CENTRAL == 0;

then cat then together.... 000 would be, for example, no sport, no lgabber, no division

Muad'Dib
Strictly speaking, concatenating isn't right: does 12345 mean 12+34+5 or 1+23+45?
Nickolay
@Nickolay It doesn't have to be reversible, just unique.
CodeFusionMobile
+2  A: 

If you limit the characters in strings (such as only UPPER-CASE Latin), you can then use adjust them to base-26 numbers. Such as:

TEST word will be adjusted to J4IJ, and in decimal 337135.

Base-26 is using the digits (0-9, A-P)

With this pattern you can concatenate your string and generate the number.

dereli
That would shorten it a bit. I already force chars to upper and limit to valid latin chars.
CodeFusionMobile
I accepted this answer because it is the only one that was purely dynamic and not reliant on outside definitions. I did end up defining everything in an external XML file for my project, but this answer is most true to the OP.
CodeFusionMobile
+2  A: 

It depends on how much size you'll need for each value. Are there only a few divisions? A few sportcodes? Think about growth, too, and make sure you allow room for that.

One option is to generate a unique value for each one and then combine them. You'd number each value as an INT, so that Sportcode "AA" = 1, "AB" = 2 (assuming that's your format), number your divisions, and so on. Then, decide how many binary bits you need to express each value:

  • If a field can have 16 different values, you'll need 4 bits for that value.
  • If a field can have up to 256 difference values, you'll need 8 bits for that value.

Once you know how many bits you need for each, you can bit shift them and combine them into a single number. For example, if your first line above ends up with SportCode = 1, Lgabbr = 5, and Division = 3, and for simplicity's sake, you decided that you only need 4 bits for each column (16 possible values for each one), then:

Key = Sportcode + (Lgabbr << 4) + (Division << 8)

This would give you 1 + 80 + 768 = 849, which uniquely identifies that combination. To get that values back, you'd use the following convention:

SportCode = Key MOD 16
Lgabbr = (Key >> 4) MOD 16
Division = (Key >> 8)
rwmnau
Note, it doesn't have to be reversible, just completely unique. I like the suggestion
CodeFusionMobile