views:

500

answers:

3

I'm making a code generation script for UN/LOCODE system and the database has unique 3 letter/number codes in every country. So for example the database contains "EE TLL", EE being the country (Estonia) and TLL the unique code inside Estonia, "AR TLL" can also exist (the country code and the 3 letter/number code are stored separately). Codes are in capital letters.

The database is fairly big and already contains a huge number of locations, the user has also the possibility of entering the 3 letter/number him/herself (which will be checked against the database before submission automatically).

Finally neither 0 or 1 may be used (possible confusion with O and I).

What I'm searching for is the most efficient way to pick the next available code when none is provided.

What I've came up with:

  1. I'd check with AAA till 999, but then for each code it would require a new query (slow?).

  2. I could store all the 40000 possibilities in an array and subtract all the used codes that are already in the database... but that uses too much memory IMO (not sure what I'm talking about here actually, maybe 40000 isn't such a big number).

  3. Generate a random code and hope it doesn't exist yet and see if it does, if it does start over again. That's just risk taking.

Is there some magic MySQL query/PHP script that can get me the next available code?

A: 

I will go with number 2, it is simple and 40000 is not a big number.

To make it more efficient, you can store a number representing each 3-letter code. The conversion should be trivial because you have a total of 34 (A-Z, 2-9) letters.

phsiao
I'm not sure I understand how/what you convert 34 letters/numbers into numbers. But I guess the 2nd option is a one time only operation and if you say 40000 is small then ok :)
Solenoid
Löwis gave an explanation at second part of his solution.
phsiao
A: 

I would for option 1 (i.e. do a sequential search), adding a table that gives the last assigned code per country (i.e. such that AAA..code are all assigned already). When assigning a new code through sequential scan, that table gets updated; for user-assigned codes, it remains unmodified.

If you don't want to issue repeated queries, you can also write this scan as a stored routine.

To simplify iteration, it might be better to treat the three-letter codes as numbers (as Shawn Hsiao suggests), i.e. give a meaning to A-Z = 0..25, and 2..9 = 26..33. Then, XYZ is the number X*34^2+Y*34+Z == 23*1156+24*34+25 == 27429. This should be doable using standard MySQL functions, in particular using CONV.

Martin v. Löwis
A: 

I went with the 2nd option. I was also able to make a script that will try to match as close as possible the country name, for example for Tartu it will try to match T** then TA* and if possible TAR, if not it will try TAT as T is the next letter after R in Tartu.

The code is quite extensive, I'll just post the part that takes the first possible code:

$allowed = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ23456789';
$length = strlen($allowed);
$codes = array();
// store all possibilities in a huge array
for($i=0;$i<$length;$i++)
    for($j=0;$j<$length;$j++)
     for($k=0;$k<$length;$k++)
      $codes[] = substr($allowed, $i, 1).substr($allowed, $j, 1).substr($allowed, $k, 1);

$used = array();
$query = mysql_query("SELECT code FROM location WHERE country = '$country'");
while ($result = mysql_fetch_array($query))
    $used[] = $result['code'];

$remaining = array_diff($codes, $used);

$code = $remaining[0];

Thanks for your opinion, this will be the key to transport codes all over the world :)

Solenoid