views:

453

answers:

2

Since Unicode lacks a series of zero width sorting characters, I need to determine equivalent characters that will allow me to force a certain order on a list that is automatically sorted by character values. Unfortunately the list items are not in an alphabetical order, nor is it acceptable to prefix them with visible characters to ensure the result of the sort matches the wanted outcome.

What Unicode characters can be thrown in front of regular Latin alphabet text, and will not appear, but still allow me to "spike" the sort in the way I require?

(BTW this is being done with Drupal 5 with a user profile list field. Don't bother suggesting changing that to a vocabulary/category.)

+1  A: 

Personally, I just prefer to use a primary/secondary sort key. It's less kludgy, and easy to implement in a typical sql query (ORDER BY column_a,column_b). Edited to add: In Php, you could use usort(array, comparisonFunction) with a custom comparison function to add additional logic for sorting, if you can't use SQL to do the trick.

However, if you only have one column to work with and that's unfixable, just prefix with a certain number of unlikely characters like underscores for sorting, then strip them just before you display them. (using regexp substitution or similar).

Unicode-based hacks will depend heavily on what fonts are used, what locale's collation/sorting order you're using, and may produce undesirable side effects on clients you don't have control over (different browsers, different oses, different client locales). Most "unprintable" characters yield the "unknown character" when displayed on systems without support for them, which usually looks like an empty square. There are some zero-width characters used for languages like Arabic, but they shouldn't affect sorting except in applications with very perverse Unicode support.

JasonTrue
Is PHP's Unicode handling fall under that kind of very perverse? BTW the SQL stuff doesn't help at all in this situation.
Chris Charabaruk
Not sure, because I've only used Shift-Jis, EUC-JP or ISO-8859-1 in php.Unicode doesn't solve this any more than ASCII/Iso-8859-1 would; it's not the domain of an encoding. However, _MyVal, __MyVal, and ___MyVal will sort differently.
JasonTrue
It doesn't look like php's typical sort gives you much other than, in PHP6, a locale flag. However, if you implement a comparison function of your own, you could use usort and use whatever comparison rules you like.
JasonTrue
This is on PHP5 and that's inflexible. As it is, the situation needs to be solved within the confines of Drupal, and without making any modifications to Drupal core.
Chris Charabaruk
I don't know drupal, but usort works in php5. User profile fields are just database-backed fields, no? Why can't your own code muck with the presentation behavior?
JasonTrue
I'd rather do it without having to use any code, and just base it all in Views. Unfortunately it looks like that's not an option. It'd be nice if those list fields could be weighted, then this would be a non-issue. Or it'd be nice if Usernode wasn't so bizarre.
Chris Charabaruk
+2  A: 

Zero-width space (U+200B) should probably do what you want. From the Unicode spec:

Zero Width Space. The U+200B ZERO WIDTH SPACE indicates a line break opportunity, except that it has no width. Zero-width space characters are intended to be used in languages that have no visible word spacing to represent line break opportunities, such as Thai, Khmer, and Japanese.

Should be in most fonts you run into, but YMMV.

Joe Hildebrand
I need more than just one character. After all, I'm using this as a way of sorting a sequence of strings. Non-space characters with zero width ftw.
Chris Charabaruk