tags:

views:

520

answers:

6

I have a list of movies that I have grouped by letter. Naturally, the movies starting with the letter "T" have about 80% of movies that begin with "The". Movies such as "The Dark Knight" should appear in the "D" list, and preferably in the "T" as well. Any way I can do that?

I use the following code in the WHERE clause to display movies that start with a certain letter, ignoring "the", but this also had a convenient side effect of having a movie such as "The Dark Knight" appear for letter "D" and "T".

WHERE movie_title REGEXP CONCAT('^(the )?', '$letter')

I would like to achieve this when I echo out all the movies that are in the database.

+16  A: 

If you are going to be performing this query frequently, you will want to create a separate field in the table with the 'sorted' name. Using regular expressions or other operations make it impossible for MySQL to take advantage of the index.

So, the simplest and most efficient solution is to make your add a movie_title_short field, which contains movie_title without the "The" or "A". Be sure to add an index to the movie_title_short field too!

carl
Yah i thought about that, but no its not gonna be executed a lot.
Yegor
Storage space is cheaper than CPU time. Just use a title_order field since you'll no doubt want to order by the same criteria too.
cletus
Also, adding a separate field will not solve the problem of display the same movie in the "T" and other letter list.
Yegor
It will solve the problem of a movie appearing twice. A query like this should do it: SELECT * FROM movies WHERE movie_title = '$escapedInput' OR movie_title_short = '$escapedInput' ORDER BY movie_title_short;
carl
But appearing twice is not a problem, it's a desired state which we need to achieve.
GSerg
Ah, I understand. Yes, then doing a union should achieve the optimal result.
carl
"Union"? Why not just an "OR" statement?
ashawley
"The" or "A" or "An", by the way, not just "The" or "A".
AmbroseChapel
+1  A: 
select right(movie_title, char_length(movie_title)-4) as movie_title
from movies 
where left(movie_title,3) = 'the'
union
select movie_title
from movies
GSerg
+1  A: 

You can use the mysql replace function in the select clause...

 select replace(movie_title,'The ','') from ... order by replace(movie_title,'The ','')'
jle
No, he wants "The" movies to appear twice when dumped all together - with and without 'the'.
GSerg
It will if using the same WHERE-clause. Can you -1 comments? :)
ashawley
+4  A: 

As Carl said, I'd build this into its own indexable field to avoid having to compute it each time. I'd recommend doing it in a slightly different way to avoid redundancy though.

movies (id, name, namePrefix)

eg:

| Dark Knight        | The |
| Affair To Remember | An  |
| Beautiful Mind     | A   |

This way you can show these movies in two different ways: "name, namePrefix" or "namePrefix name" and can be sorted accordingly.

nickf
Does anyone know if there is a canonical list of words to lop off titles when sorting, such as "The", "An" and "A". Also, rules for capitalisation: Should it be "An Affair To Remember" or "An Affair to Remember"?
Evan
A: 

Use this:

SELECT * FROM movies ORDER BY TRIM(LEADING 'the ' FROM LOWER(`movie_title`));
Abe
A: 

Just had that problem myself... solution is:

SELECT * FROM movies WHERE title REGEXP '^d' AND title NOT REGEXP '^the ' OR title REGEXP '^the d'

this will give you only results that starts with "The D" or "D"