I'd like to store an additional column in a table as a 'sort value', which is a numeric representation of the title column, such that the order of such values represents the string's natural alphabetical sort order. Ie, so that I can retrieve rows ordered by the sort value, and they'll be in natural sort order - and when I insert a new row, I can generate the numeric value and know that value relative to others will represent the string's position in an alphabetic search, accurate to the first X letters or so.
A couple of reasons for this: firstly, I would like a more natural ordering than a plain ordering offered by a DB server, where things like "The" and "A" and punctuation are ignored at the start, and numbers are treated 'naturally'.
Secondly, this is for an index with a lot of permutations - it will save space, and perhaps time when traversing an index with many rows.
What I am after for is the algorithm to translate the string to that numeric value, or just, I suppose, a normalised string value.
I am using PHP and MySQL.
I'm afraid that "pull everything from the DB and sort in PHP using natcasesort()" is not a solution for this particular situation, as I'd like to retrieve rows (using order by and group by) in sorted order before they get to a join or limit clause. Thanks.
Edit:
Thanks for answers so far. It's just occurred to me that the fact my application uses UTF-8 is quite relevant. With that said, I think the practicality of representing the initial part of a string in a packed/numeric form is a stretch, maybe just some sort of normalised form (everything case-folded, numbers zero-padded, and as many characters as possible normalised to their root ie ã to a) would be appropriate.