tags:

views:

31

answers:

3

Hi,

Im a newbie with regular expressions and i need some help :)

I have this:

$url = '<img src="http://mi.url.com/iconos/oks/milan.gif" alt="Milan">';
$pattern = '/<img src="http:\/\/mi.url.com/iconos/oks/(.*)" alt="(.*)"\>/i';

preg_match_all($pattern, $url, $matches);

print_r($matches);

And I get this error:

Warning: preg_match_all() [function.preg-match-all]: Unknown modifier 'c'

I want to select that 'milan.gif'

How can I do that ?

Thanks!

+1  A: 

The problem is that you haven't escaped the forward slashes in the url string (you have escaped the ones in the http:// part, but not the url path).

Therefore the first one it comes across it (which is after .com), it thinks is the end of the regex, so it treats everything after that slash as the 'modifier' codes.

The next character ('i') is a valid modifier (as you know, since you're actually using it in your example), so that passes the test. However the next character ('c') is not, so it throws an error, which is what you're seeing.

To fix it, simply escape the slashes. So your example would look like this:

$pattern = '/<img src="http:\/\/mi.url.com\/iconos\/oks\/(.*)" alt="(.*)"\\>/i';

Hope that helps.

Note, as someone has already said, it's generally not advisable to use regex to match HTML, since HTML can be too complex to match accurately. It's generally preferrable to use a DOM parser. In your example, the regex could fail if the alt attribute or the end of the image URL contains unexpected characters, or if the quoting in the HTML code isn't as you expect.

Spudley
+5  A: 

If you’re using / as delimiter, you need to escape every occurrence of that character inside the regular expression. You didn’t:

/<img src="http:\/\/mi.url.com/iconos/oks/(.*)" alt="(.*)"\>/i
                              ^

Here the marked / is treated as end delimiter of the regular expression and everything after is is treated as modifier. i is a valid modifier but c isn’t (see your error message).

So:

/<img src="http:\/\/mi\.url\.com\/iconos\/oks\/(.*)" alt="(.*)"\>/i

But as Pekka already noted in the comments, you shouldn’t try to use regular expressions on a non-regular language like HTML. Use an HTML parser instead. Take a look at Best methods to parse HTML.

Gumbo
Nice one. An alternative would be to use another delimiter, such as `#` ...
Marius Schulz
A: 

Thank you all :)

Klian