views:

121

answers:

3

Hello everyone!

So here is my situation, and the solution that I've come up with to solve the problem. I have created an application that includes TinyMCE to allow users to create HTML content for publishing. The user can include images in their markup, and drag/resize those images affecting the final Width/Height attributes in the IMG tag. This is all great, the users can include images and resize/relocate them to their desired appearance. But one big problem is that I am now sending a (possibly) much larger image to the client, only to have the browser resize the image into the requested Width/Height attributes. All that bandwidth and lost load time....

So my solution is to pre-process my users markup content, scanning all of the IMG tags and parsing out the Height/Width/Src attributes. Then set each img's SRC tag to a phpThumb request with the parsed Height/Width passed into the thumbnails URL. This will create my reduced size image (optimising bandwidth at the expense of CPU and caching). What do you think about this solution? I've seen other posts where people were using mod_rewrite to do something similar, but I want to affect the content on the page service and not manipulate the image requests as they're being received. .... Any thoughts about this design?

I need some help with the fine details as my regex skills need some work, but I'm very short on time and promise to pay my technical knowledge debt soon. To make the regex's easier, I can be sure of some things. Only img tags that need this processing will have an existing width="" height="" attributes (with the double quotes, and lower cased text, but I suppose matching the text case insensitive would be better if TinyMCE changes)

So a regex to match only the necessary Img tags, and maybe another three regex's to extract the src, the width, and the height?

Thanks everyone.

+3  A: 

I think using regexs for this is a bad idea and you'd be better off parsing it using something like PHP Simple HTML DOM Parser, then you can do something like:

// Load HTML from a string
$html->load($your_posted_content);

// Find all images 
foreach($html->find('img') as $element) 
       echo $element->src . '<br>';
Richard Harrison
I've implemented my solution using the Simple HTML DOM Parser you suggested. It works like a charm :)
CryptoMonkey
Excellent news. It's also a handy technique for many similar tasks.
Richard Harrison
A: 

Generally speaking, RegEx is not good for HTML parsing.. But in your case you may be able to get away with it if your limiting the scope to be very narrow (i.e. only searching for the width=".." and height=".." attributes.. or something like that).

A better solution might be to transfer the content from TinyMCE asynchronously, behing-the scenes, and process it server-side with a proper HTML/XML parser, and then updated the content of the editor once that's done.

Miky Dinescu
And let's not forget http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 - if only because that particular horse cannot possibly be beaten dead *enough*. ;) [Disclaimer: Link is humourous only, don't expect a grand new insight or anything.]
pinkgothic
and yes.. there's that post too.. :)
Miky Dinescu
That was very funny :)
CryptoMonkey
+1  A: 

Try this:

(?i)<img(?>\s+(?>src="([^"]*)"|width="([^"]*)"|height="([^"]*)"|\w+="[^"]*"))+

That will match any image tag, and if the src, width, and height attributes are present, their values will be stored in groups 1, 2, and 3 respectively. But it doesn't require any of those attributes to be there, so you'll want to verify that all three groups contain values before processing.

Alan Moore