views:

122

answers:

3

I have values that are 64-bit unsigned ints, and I need to store them in mongodb, which has no unsigned int type. I see three main possibilities for storing them in other field types, and converting on going in and out:

Using a signed int is probably easiest and most space efficient, but has the disadvantage that they're not human readable and if someone forgets to do the conversion, some of them will work, which may obscure errors.

Raw binary is probably most difficult for inexperienced programmers to deal with, and also suffers from non-human-readability.

A string representation is the least space efficient (~40 bytes in unicode vs 8 bytes per field), but then at least all of the possible values will map properly, and for querying only a conversion to string is required instead of a more complicated conversion.

I need these values to be available from different platforms, so a single driver-specific solution isn't an option.

Any major pros and cons I've missed? Which one would you use?

A: 

I would say go with binary - it's the only solution above where getting sort orders on queries right is going to be trivial.

mdirolf
In this case, I don't actually care about sorting at all. The numbers aren't meaningful, and are randomly assigned. They are simply pointers into another db that uses them for keys.
fields
A: 

I'd just shove the numbers into strings. It's the easiest and the most compatible solution. Most common programming languages provide string to number conversation in their standard libraries. If someone else needs to read your database with a different program later they don't need to figure out your binary storage format. Another bonus is you can store numbers larger than an unsigned int64 if you need to.

epochwolf
A: 

Why would a string value have to be in unicode? You know the value will always be digits so you can use a standard varchar which means no more than 20 bytes. To be honest, it really depends on how the value will be used. Is it going to be used in lots of joins to the source which is using unsigned 64 ints? If so, there will have to be a conversion on each row. Is it only going to be used for reference or to filter for specific values (as opposed to a join to the mongodb)? If so, then a string value will perform well enough.

Another solution, if it is possible, would be to add a 64-signed int column in the mongodb that represents the signed version of the 64 unsigned int and then use the signed int in your db. In this way you can join on apples and apples and can compare the values from one system to another.

Given what you have said, I still contend that a varchar column will perform well enough and makes the value human readable.

EDIT Another solution would be to store the value in a signed 64-bit int and add a method to your item which calculates the unsigned 64-bit value so that users can verify the value.

Thomas
The string value would have to be unicode because mongodb only stores unicode strings. There are also no joins in mongodb.
fields