I have a bunch of items in my database. Each is assigned a unique ID. I want to shorten this ID and display it on the page, so that if I user needs to contact us (over the phone) regarding a particular item, he can give us the shortened ID, rather than a really big number. Similar to the SKU, on sites like NCIX. Thus, I was thinking about encoding it in base 36. The problem with that, however, is letters like 1lI
all look kind of the same. So, I was thinking about eliminating the look-alikes. Is this a good idea, or should I just use a really legible font?
views:
111answers:
4Yes, you should eliminate sources of confusion. Because if a mistake can be made, someone will make it. Very easy to confuse 0 with O and I with l or 1 - hence should not use them both. Well that's easy - since you won't use 3 chars (i, L and o), just get the number in base 36-3 = 33 and convert
SKU.replace('I','X').replace('L','Y').replace('O','Z')
Inversely when given such code and before doing int(SKU, 33), you will have to return XYZ back to the confusing characters. Before that though, if - as expected - you are given by mistake L or I, replace with 1 and if given O, replace with 0. E.g. use SKU.translate() with
string.maketrans('LIOXYZ','110IL0')
We had a similar situation in a regular app many years ago, at a company I worked for. There was an ID, base 36 (0-9a-z) that often had to be communicated over the phone. That was an application running on a Unix server and viewed on serial terminals (not relevant, just part of the story :).
Our solution was that whenever the user was on that field and pressed F2, a small window popped-up having the radio code for the field: “a9vg5” would display “alpha niner victor golf five”, which the user would just read aloud.
When the application was developed, I had the inclination to display the ID as base 64 encoded, with capitals plus dot and slash, and use different radio-code words for the capitals, but the designated analyst disagreed. You could look-up different words in Wikipedia or be creative.
PS a clarification: although it's not clear the way I wrote it, the analyst disagreed with a good reason, since one has to think both sides of the communication; the user just reads, but the other side on the phone has to remember or look up that e.g. delta==d and Dalton==D.
I'm assuming the original ID is numeric. We've had good results from z-base-32 with a similar scenario. We've been using it since April 2009.
I particularly liked the encoding's goals of minimizing transcription errors, through removing confusing letters from the alphabet, and brevity, as shorter identifiers are easier to use.
The encoding orders the alphabet so that the more commonly occurring characters are those that are easier to read, write, speak and remember. Lower case is used as it's easier to read.
I asked this similar question before we decided to use z-base-32.