views:

350

answers:

6

Hi!

I'm trying to make a simple php script to find all src attributes from all images in a html text and then replace all found srcs with some text after making some conditional changes.

Something like this:

@preg_match_all('/<img\s src="([a-zA-Z0-9\.;:\/\?&=_|\r|\n]{1,})"/isxmU', $body, $images);

now i've all srcs into the $images variable, now i make:

foreach ($images as $img) {
    ..my changes here..
}

and now... how can i restore the changed srcs to the $body variable again??

many thanks in advance,

+6  A: 

You should look into preg_replace_callback(), which will allow you to postprocess each match however you like, using a callback function. (You would use it instead of your preg_match_all(), not in addition to it.)

chaos
Many thanks chaos, i'm reading the documentation for preg_replace_callback() right now, bout i'm unable to get it working. I'm using this code to test it:function ReplaceImage($match){ $match[1] = "REPLACED"; return $match[1]; } $body = preg_replace_callback('/<img\s src="([a-zA-Z0-9\.;:\/\?i want to replace all image sources to the text "REPLACED", but it doesn't work
fidoboy
Well, `preg_replace_callback()` isn't broken. It looks to me like your regex is; why are you requiring a whitespace element *followed by* a space after the `img`?
chaos
A: 

Don't what you want is to use preg_replace? With the e modifier the replacement text is eval'd so you can have a function that do on the text-to-be-replaced the same thing that you would have done in your foreach loop.

EDIT: preg_replace_callback is cleaner than using the e modifier with preg_replace, didn't thought of that while writing my anser, so chaos answer is better.

p4bl0
A: 

I think the easiest answer you're looking for is to do a str_replace.

foreach ($images as $img) {
    ..my changes here..
    $body = str_replace($original_string, $modified_string, $output_body);
}
Ryan
Easy, but not the best option, e.g. what if two original_strings are the same, but should be replaced by something different (here used to be image no xyz)? Or if one replacement string is the same as one to be replaced somewhere down on the page?
Residuum
You are right Residuum, thats the problem... i need to replace exactly the same string in same position. I think that preg_replace_callback is the best option, but i'm unable to get it working...Can anyone put a simple sample to use it, replacing all img srcs with a incremented number?
fidoboy
+3  A: 

Use a HTML DOM parser instead, much easier to use and maintain http://simplehtmldom.sourceforge.net/

Stephen lacy
will this work on HTML fragments or malformed HTML?
Darren Newton
It says it supports invalid html, I haven't used it so I don't know how well, I imagine that would be a pretty basic requirement of any html parser, let us know how you get on. In case what you are looking for is an xss filtering tool check out http://htmlpurifier.org/
Stephen lacy
A: 

A non-validating parser may be even better if you need to work with badly formed HTML.

http://pear.php.net/package/XML%5FHTMLSax3

jetxee
A: 

I asked a question yesterday about a good interface for modifying and traversing HTML files. You may be interested in this:

jQuery port to PHP

This may be a good alternative if you are already familiar with jQuery's API.

theotherlight