tags:

views:

230

answers:

6

I'm trying to sort a list of titles, but currently there's a giant block of titles which start with 'The '. I'd like the 'The ' to be ignored, and the sort to work off the second word. Is that possible in SQL, or do I have to do custom work on the front end?

For example, current sorting:

  • Airplane
  • Children of Men
  • Full Metal Jacket
  • Pulp Fiction
  • The Fountain
  • The Great Escape
  • The Queen
  • Zardoz

Would be better sorted:

  • Airplane
  • Children of Men
  • The Fountain
  • Full Metal Jacket
  • The Great Escape
  • Pulp Fiction
  • The Queen
  • Zardoz

Almost as if the records were stored as 'Fountain, The', and the like. But I don't want to store them that way if I can, which is of course the crux of the problem.

+2  A: 

I think you could do something like

ORDER BY REPLACE(TITLE, 'The ', '')

although this would replace any occurrence of 'The ' with '', not just the first 'The ', although I don't think this would affect very much.

MusiGenesis
Well I don't like my idea either, but at least explain whether it's not the best solution or whether it wouldn't even run.
MusiGenesis
+11  A: 

Best is to have a computed column to do this, so that you can index the computed column and order by that. Otherwise, the sort will be a lot of work.

So then you can have your computed column as:

CASE WHEN title LIKE 'The %' THEN stuff(title,1,4,'') + ', The' ELSE title END

Edit: If STUFF isn't available in MySQL, then use RIGHT or SUBSTRING to remove the leading 4 characters. But still try to use a computed column if possible, so that indexing can be better. The same logic should be applicable to rip out "A " and "An ".

Rob

Rob Farley
you may also want to make it all lowercase in the sort column, to get case-insensitive sorts (and searches)
Thilo
Yes, if you have a case-sensitive situation. But I imagine that as movie titles, the case is probably controlled in the input, so that you don't need to have the performance hit associated with finding records based on their lowercase equivalent.
Rob Farley
+5  A: 

Something like:

ORDER BY IF(LEFT(title,2)="A ",SUBSTRING(title FROM 3),IF(LEFT(title,3)="An ",SUBSTRING(title FROM 4),IF(LEFT(title,4)="The ",SUBSTRING(title FROM 5),title)))

But given the overhead of doing this more than a few times, you're really better off storing the title sort value in another column...

great_llama
+1  A: 

The best way to handle this would be to have a column that contains the value you want to use specifically for ordering output. Then you'd just have to use:

  SELECT t.title
    FROM MOVIES t
ORDER BY t.order_title

There are going to be various rules about what should and should not be used to order titles.

Based on your example, an alternative would be to use something like:

   SELECT t.title
    FROM MOVIES t
ORDER BY SUBSTR(t.title, INSTR(t.title, 'The '))

You could use a CASE statement to contain the various rules.

OMG Ponies
+1  A: 

You can certainly arrange dynamically strip off 'The', though you'll soon find that you have to deal with 'A' and 'An' (except for the special case of titles like "A is for Alibi"). When "foreign" films enter the mix, you'll need to cope with "El" and "La" (except for that pesky edge case, "LA Story"). Then mix in some German films, and you'll need to cope with 'Der' and 'Die' (except for that pesky set of 'Die Hard' edge cases). See the pattern? You're headed down a path that keeps getting longer and more pitted with special cases.

The way forward on this that avoids an ever-growing set of special cases is to store the title as you want it display and store the title as you want it sorted.

Dave W. Smith
Very good point, though fortunately in my use case perfection is not necessary, so those edge cases can be ignored, at least for now. +1 for mentioning the problem though.
dimo414
A: 

All very helpful. Now what if you want to omit leading "The" as well as "A" in the sort order?

bahmanmahdavi
For a follow-up question best start a new question, not post it as an answer. You can of course always link bak to this for reference, if you want to. (The "Ask Question" button is in the top right of the page)
sth