tags:

views:

58

answers:

2

I have several thousand text strings where the IMDB occures in fairly random positions, but its always in the following format: tt0234215 (tt + some numbers).

What would be the best way to strip it out in php?

+7  A: 

Probably by using a regular expression:

preg_match_all("/tt\\d{7}/", $string, $ids);
// $ids will be an array containing the matches

This will search the string for all instances of "tt" followed by exactly seven digits and return them as an array (extra digits will be ignored). (preg_match_all in the PHP docs)

Ben Blank
Change to $ids = array();
hopeseekr
@hopeseekr — D'oh! Too much Python lately. :-)
Ben Blank
Wouldn't this fail with the movie "3 Ninjas" ? That is to say it could potentially take too much information away
Woot4Moo
Dont really need to declare $id, and its gotta be #tt\\d+# but otherwise it works.
Yegor
@Woot4Moo Good point...he could change it to: "tt\\d{7}", I believe.
treeface
@Yegor at al. — The declaration was just by way of example. I've updated the code to be more correct and less confusing.
Ben Blank