views:

288

answers:

7

Hi,

I have an 18 Character String that I need to convert into a unique long (in Java). A sample String would be: AAA2aNAAAAAAADnAAA

My String is actually an Oracle ROWID, so it can be broken down if needs be, see: http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14220/datatype.htm#CNCPT713

The long number generated, (1) Must be unique, as no two results can point to the same database row and (2) Must be reversible, so I can get the ROWID String back from the long?

Any suggestions on an algorithm to use would be welcome.

Oracle forum question on this from a few years ago : http://forums.oracle.com/forums/thread.jspa?messageID=1059740

Ro

+11  A: 

You can't, with those requirements.

18 characters of (assuming) upper and lower case letters has 5618 or about 2.93348915 × 10331 combinations. This is (way) more than the approximate 1.84467441 × 1019 combinations available among 64 bits.

UPDATE: I had the combinatorics wrong, heh. Same result though.

unwind
According to the documentation it's a base 64 encoding, using a-z, A-Z, 0-9 as well as + and /. So it's even worse :-)
Joey
If digits are allowed, then make that 18^((2*26)+10), worse again.
Liam
Yeah, however the 18-character string can be broken down into its components, so I was wondering if anything could be done because of that:AAA2aNAAAAAAADnAAA = AAA2aN - AAA - AAAADn - AAAIn addition, the guarantee of uniqueness would realistically only have to cover at most 100 million cases .... Unlikely to have a database table greater than that !
Ro
ehm, 56 options for each character, 18 characters, means there's 56^18 possible combinations, doesn't it?
wds
@wds: Thanks! Fixed, heh.
unwind
I doubt if we ever surpass the number limit of 18 digit number(1 Billion Billion, even Google has indexed pages of not more than 100Billion) in real world. So mapping from String -> Number should not be posed with any constraint.
DKSRathore
A: 

This sounds ... icky, but I don't know your context so trying not to pass judgement. 8)

Have you considered converting the characters in the string into their ASCII equivalents?

ADDENDUM: Of course required truncating out semi-superflous characters to fit, which sounds like an option you may have from comments.

Iain Collins
Yeah .... This did come up before alright .... http://forums.oracle.com/forums/thread.jspa?messageID=1059740
Ro
+3  A: 

Your String of 18 characters representing a base 64 encoding represents a total of 108 bits of information, which is almost twice that of long's 64. We have a bit of a problem here if we want to represent every possible key and have the representation be reversible.

The string can be broken down into 4 numbers easily enough. Each of those 4 numbers represents something - a block number, an offset in that block, whatever. If you manage to establish upper limits on the underlying quantities such that you know larger numbers will not occur (i.e. if you find a way to identify at least 44 of those bits that will always be 0), then you can map the rest onto a long, reversibly.

Another possibility would be to relax the requirement that the equivalent be a long. How about a BigInteger? That would make it easy.

Carl Smotricz
"How about a BigInteger?" Or two longs.
Steve Jessop
I briefly considered that, but two longs is yucky, IMO. We're working in OO languages so we can treat single values as single entities. For sufficiently small numbers, BigInteger *is* effectively two longs, but it's wrapped into a coherent package.
Carl Smotricz
Sure, it's just that we aren't going to do any maths. I'd probably define a class with two `long` fields ("tophalf" and "bottomhalf" or whatever) and methods to convert to/from strings. But really it depends why the questioner (thinks he) needs a long. If he's only got 8 bytes of storage, then neither BigInteger nor two longs is possible.
Steve Jessop
+2  A: 

I'm assuming that's a case-insensitive alpha-numeric string, and so drawn from the set [a-zA-Z0-9]*

In that case you have

26 + 26 + 10 = 62

possible values for each character.

62 < 64 = 2^6

In other words you need (at least) 6 bits to store each of the 18 characters of the key.

6 * 18 = 108 bits

to store the entire string uniquely.

108 bits  = (108 / 8) = 13.5 bytes.

Therefore as long as your data type can store at least 13.5 bytes then you can fairly simply define a mapping:

  1. Map from raw ASCII for each character to a representation using only 6 bits
  2. Concatenate all 18 reduced representations to a sinlde 14 byte value
  3. Cast this to your final data value

Obviously Java has nothing more than an 8 byte long. So if you have to use a long then it is NOT possible to uniquely map the strings, unless there is something else which reduces the space of valid input strings.

Mark Pim
It's actually a base 64 encoding so it also includes `+` and `/`.
Carl Smotricz
OK, that still allows it to fit in 6 bits per character though
Mark Pim
+4  A: 

Just create a map (dictionary / hashtable) that maps ROWID strings to an (incremented) long. If you keep two such dictionaries and wrap them up in a nice class, you will have a bidirectional lookup between the strings and the long IDs.

Pseudocode:

class BidirectionalLookup:
    dict<string, long> stringToLong
    dict<long, string> longToString
    long lastId

    addString(string): long
        newId = atomic(++lastId)
        stringToLong[string] = newId
        longToString[newId] = string
        return newId

    lookUp(string): long
        return stringToLong[string]

    lookUp(long): string
        return longToString[long]
Daren Thomas
This is what i previously implemented (after initial investigation - see Oracle Forum link).The problem is that this hashmap cache has grown to over upper limit for the size of a hashmap !Hence its being investigated again
Ro
http://forums.oracle.com/forums/thread.jspa?messageID=1059740
Ro
why not use a table in your database for this?
Daren Thomas
+1  A: 

Theoretically, you can't represent ROWID in a long (8 bytes). However, depending on the size of your databases (the whole server, not only your table), you might be able to encode it into a long.

Here is the layout of ROWID,

   OOOOOO-FFF-BBBBBB-RRR

Where O is ObjectID. F is FileNo. B is Block and R is Row Number. All of them are Base64-encoded. As you can see O & B can have 36-bits and B&R can have 18.

If your database is not huge, you can use 2 byte for each part. Basically, your ObjectId and block number will be limited to 64K. Our DBA believes our database has to be several magnitude bigger for us to get close to these limits.

I would suggest you find max of each part in your database and see if you are close. I wouldn't use long if they are anywhere near the limit.

ZZ Coder
Inserting new rows long after a table in created can result in radically different ROWIDs for the new rows, so we can't really go that route.
Ro
A: 
Ro