views:

80

answers:

4

Content of 1.txt:

Image" href="images/product_images/original_images/9961_1.jpg" rel="disable-zoom:false; disable-expand: false"><img src="im

Code that does not work:

<?php
$pattern = '/(images\/product_images\/original_images\/)(.*)(\.jpg)/i';
$result = file_get_contents("1.txt");
preg_match($pattern,$result,$match);

echo "<h3>Preg_match Pattern test:</h3><br><br><pre>";
print_r($match);
echo "</pre>";
?>

I expect this result:

Array
(
    [0] => images/product_images/original_images/9961_1.jpg
    [1] => images/product_images/original_images/
    [2] => 9961_1
    [3] => .jpg
)

But i take this-like:

Array
(
    [0] => images/product_images/original_images/9961_1.jpg" rel="disable-zoom:false; disable-expand: false"> 
    [1] => images/product_images/original_images/
    [2] => 9961_1.jpg" rel="disable-zoom:false; disable-expand: false"> 
)

I'n tired of trying from a million combinations of this regexp. I dunno what's wrong. Please and thanks a lot!

+4  A: 

Make it ungreedy:

$pattern = '/(images\/product_images\/original_images\/)(.*?)(\.jpg)/i';
Wrikken
Can you explain this "?" after "*". Thanks.
Ax
@Ax: It makes it ungreedy :). It tells the * to stop matching at the first occurrence of the folowing pattern (\.jpg), instead of the last occurence (default -> greedy). Standard behavior dictates that even if it encounters the folowing pattern, it will keep going until the last occurrence it finds. The ? operator changes this behavior.
Powertieke
+2  A: 

Remember that Regular Expressions are greedy. Your second capture (.*) says to match any character except the new line (unless in mutliline mode). So it is probably capturing the rest of the line.

You can make it ungreedy as suggested by Wrikken. But I like to ensure I am capturing what I want. In your case, it looks like the value of the href attribute. So really I want at least 1 character, can't be a quote, followed by the jpg extension:

$pattern = '/(images\/product_images\/original_images\/)([^'"]+)(\.jpg)/i';
Jason McCreary
I know it. I've trying to make it ungreedy to, but it's still not work. Dunno why. More there, your pattern calls fatal error of php.
Ax
Sorry, escape the single quote.
Jason McCreary
A: 

Here's the basic regex:

href="((.*/)(.*?)(.jpg))"
StackOverflowNewbie
A: 

Do not parse HTML with regex.

Do not parse HTML with regex.

Do not parse HTML with regex.

Randal Schwartz
Do **not** parse HTML with regex.
sberry2A
Do **not** confuse the searching of an arbitraty text file for a certain file path with the parsing of a HTML file.
Powertieke
1st I can't select needed attr with DOMand 2nd i've recive HTML over cURL.
Ax