tags:

views:

137

answers:

3

Suppose we have this html content, and we are willing to get Content1, Content2,.. with regular expression.

<li>Content1</li>
<li>Content2</li>
<li>Content3</li>
<li>Content4</li>

If I use the line below

preg_match_all('/<li>(.*)<\/li>/', $text, $result);

i will get an array with a single row containing:

Content1</li>
<li>Content2</li>
<li>Content3</li>
<li>Content4

And by using this code:

preg_match_all('/<li>(.*?)<\/li>/', $text, $result);

i will get an array with 4 row containing Content1, Content2, ...

Why (.*) is not working since it means match any character zero or more times

+19  A: 

* matches in a greedy fashion, *? matches in a non-greedy fashion.

What this means is that .* will match as many characters as possible, including all intermediate </li><li> pairs, stopping only at the last occurrence of </li>. On the other hand, .*? will match as few characters as possible, stopping at the first occurrence of </li>.

Thomas
+6  A: 

Because .* itself is greedy and eats up as much as it can (i.e. up to the last </li>) while still allowing the pattern to match. .*? on the other hand is not greedy and eats up as little as possible (stopping at first </li>).

kemp
+3  A: 

See this article's section about greedyness of regular expressions.

Anders Abel
thanks, very usefull link
EBAGHAKI