tags:

views:

78

answers:

4

I have the following string:

<?php
$string = '<meta name="Keywords" lang="fr" content="ecole commerce,
 apres bac, ecole management, ecole de management, écoles de commerce,
 école de management, classement ecole de commerce, ecole commerce paris,
 ecole superieure de commerce, concours ecole commerce, hec, esc, prepa,
 forum ecole commerce, avis ecole commerce" /><meta name="description"
 content="Tout pour s\'informer et échanger sur les écoles de commerce
 et de management, les concours, les classements, la prépa... Des
 témoignages et un forum pour faire le meilleur choix" /><meta
 name="robots" content="all" />';
?>

and I try to get only the "description" meta from it with this regex expression:

 <?php
 echo preg_replace('/(?:.*)name\="description" content\="(.*)"(?:.*)/i',
                                                                  '$1', $string);
 ?>

but what I get is:

Tout pour s'informer et échanger sur les écoles de commerce et de management,
 les concours, les classements, la prépa... Des témoignages et un forum
 pour faire le meilleur choix" /><meta name="robots" content="all

So, why the extra " /><meta name="robots" content="all ?!

ps: there are no line breaks in the code, i just added them for readability...

+2  A: 

You should also add the option U (Ungreedy) to your regexp. In this case, it matches the last " of your string, which is why you get the tag part.

preg_replace('/(?:.*)name\="description" content\="(.*)"(?:.*)/iU', '$1', $string);

Note you could also replace it by something like this :

preg_replace('/(?:.*)name\="description" content\="([^"]*)"/i', '$1', $string);

[^"] means "anything that is not a double quote". The last (?:.*) is also useless.

I also like to use preg_match with a third argument when you want to match something and not replace it. Basically, I would do what you want to do like this :

$var = array();
preg_match('/name\="description" content\="([^"]*)"/iU', $string, $var);

$var[1] contains your string if the regexp found a match.

Vincent Savard
PS: The first `(?:.*)` is useless too ;)
nikic
+1  A: 

Don't use greedy regexps for it, this will work:

<?php echo preg_replace('/(?:.*)name\="description" content\="(.*?)"(?:.*)/i', '$1', $string); ?>
valodzka
+1  A: 

An idiom I use to avoid greedy regexes is to use a search pattern inverse to the enclosures (that is [^"] if something is supposed to be enclosed by quotes). More reliable for edgy edge cases:

  /content="([^"]*)"/i
mario
And also potentially more efficient.
bobince
A: 

/(?:.)name\="description" content\="-->(.)<--this is what matches the extra stuff that you don't want/did not expect to match.

/(?:.)name\="description" content\="(.)-->"<--this is what matches the quote after the word 'all'

You want the regex to stop matching sooner rather than later, hence the need to put it into a un-greedy mode of operation (which other posters have said).

Gabriel