tags:

views:

130

answers:

5

Im storing imdb.com links for each movie thats listed in the DB, and check for duplicates before a new movie is inserted. The problem is, some links are http://imdb.com/whatever while others are http://www.imdb.com/whatever

What would be the best way to force www. into every link thats submitted? I realize I should be storing the url without http:// or http://www. which would alleviate this problem all together.... but its too late to make that decision now.

A: 

You could use regular expression to force the URL but not all host names start with www.

eed3si9n
A: 

As your storing the link, can't you check if it starts with http://imdb and replace that with http://www.imdb?

Bernard Chen
+7  A: 

Why don't you just store IMDB's movie id rather than the entire URL? If you just store the ID then you can build the URL programmatically.

For Instance for this url http://www.imdb.com/title/tt1049413/ you could just store tt1049413. This is a better design in my opinion because if IMDB ever changes their URL format you can just change the part of your app that builds the url rather than changing every row with a bad url.

Hardwareguy
+5  A: 

Use MySQL to fix the existing ones:

UPDATE table SET URL=REPLACE(URL,'http://imdb.com','http://www.imdb.com') WHERE URL LIKE 'http://imdb.com/%';

Then use PHP to fix inbound URLs beforehand:

$url = str_replace('http://imdb.com','http://www.imdb.com',$url);

But the best method is to store imdb.com's movie ID in your database instead:

http://www.imdb.com/title/tt0088846/

Store "tt0088846" instead, or even better, 88846 as your Primary Key, and use a constant:

$imdb_url = "http://www.imdb.com/title/tt{ID}/";
$url = str_replace("{ID}", $movie_id, $imdb_url);

That way it's much faster and easier to detect duplicates. Note that IMDB has different media types (actors, etc.) which use a different prefix (nm for actors, etc.) so be aware when designing your database.

razzed
A: 

To answer your question, forcing non-www. links on submission would be a better option in my opinion, plus I'd update the database using razzed's solution.

$url = str_replace('http://www.', 'htp://', $url);

Still, I would store only the IMDB ID.

ciops