views:

43

answers:

2

This comes up a lot, and I can see it's come up on StackOverflow for XSLT, Ruby and Drupal but I don't see it specifically for SQL.

So the question is, how do you sort titles correctly when they begin with "The", "A", or "An"?

One way is simply to TRIM() those strings:

ORDER BY TRIM( 
  LEADING 'a ' FROM 
  TRIM( 
    LEADING 'an ' FROM 
    TRIM( 
      LEADING 'the ' FROM LOWER( title ) 
      ) 
    ) 
  )

which was suggested on AskMeFi a while back (does it need that LOWER() function?).

I know I've also seen some kind of Case/Switch implementation of this but it's a little hard to Google for.

Obviously there are a number of possible solutions. What would be good is SQL gurus weighing in on which have performance implications.

+3  A: 

One approach I've seen was to have two columns - one for display and the other for sorting:

description  |  sort_desc
----------------------------
The the      | the, The
A test         | test, A
I, Robot      | i, Robot

I haven't done any real world testing, but this has the benefit of being able to use an index and doesn't require string manipulation every time you want to order by the description. Unless your database supports materialized views (which MySQL doesn't), implementing the logic as a computed column in a view wouldn't provide any benefit because you can't index the computed column.

OMG Ponies
A: 

I can only speak for SQL Server: you use LTRIM within CASE statements. No LOWER function is needed because selections are not case sensitive by default. However, if you want to ignore articles then I would suggest you use a noise word dictionary and set up a full text indexing catalog. I am unsure if other implementations are SQL support this.

Carnotaurus
Case sensitivity is dependent on collation. Full Text Search (FTS) is available on MySQL, Oracle, SQL Server... Dunno what PostgreSQL's is but I'm sure it has native functionality. And there are 3rd party FTS like sphinx...
OMG Ponies
"you use LTRIM within CASE statements" -- does this mean you do the equivalent of "if it starts with 'the ', trim it"? I was wondering if that would slow the process down, as opposed to a blanket TRIM() which might be failing most of the time.
AmbroseChapel
LTRIM gets rid of leading spaces
Carnotaurus