I have some data that comes regularily as a dump from a data souce with a string natural key that is long (up to 60 characters) and not relevant to the end user. I am using this key in a url. This makes urls too long and user unfriendly.
I would like to transform the string keys into integers with the following requirements:
The source dataset will change over time.
The ID should be:
- non negative integer
- unique and constant even if the set of input keys changes
- preferrably reversible back to key (not a strong requirement)
The database is rebuilt from scratch every time so I can not remember the already assigned IDs and match the new data set to existing IDs and generate sequential IDs for the added keys.
There are currently around 30000 distinct keys and the set is constantly growing.
How to implement a function that will map string keys to integer IDs?
What I have thought about:
1.
Built-in string.GetHashCode:
ID(key) = Math.Abs(key.GetHashCode())
- is not guaranteed to be unique
- (not reversible)
1.1
"Re-hashing" the built-in GetHashCode until a unique ID is generated to prevent collisions.
- existing IDs may change if something colliding is added to the beginning of the input data set
2.
a perfect hashing function
- I am not sure if this can generate constant IDs if the set of inputs changes
- (not reversible)
3.
translate to base 36/64/??
- does not shorten the long keys enough
What are the other options?