views:

474

answers:

4

In need of a regex master here!

<img src="\img.gif" style="float:left; border:0" />
<img src="\img.gif" style="border:0; float:right" />

Given the above HTML, I need a regex pattern that will match "float:right" or "float:left" but only on an img tag.

Thanks in advance!

+3  A: 
/<img\s[^>]*style\s*=\s*"[^"]*\bfloat\s*:\s*(left|right)[^"]*"/i

Have to advise you, though: in my experience, no matter what regex you write, someone will be able to come up with valid HTML that breaks it. If you really want to do this in a general, reliable way, you need to parse the HTML, not throw regexes at it.

chaos
You're the man.
Kappers
Don't know why someone would do this, but `<img ... alt='style="float:left"' style="float:right"...>`
Daniel LeCheminant
Or use single quotes for the style attribute. Or use *no* quotes for the style attribute. Or embed entire HTML tags in attributes that validly support it as values, prior to the style attribute. Like I said, if you want anything approaching reliability, you have to parse.
chaos
No worries, in this case, I generate the html, if it's invalid, it's my fault. :) Thanks for the advice though!
Kappers
It amazes me that people consistently try to use regexes to parse HTML. I saw a similar question earlier today. I even tried to answer it - and got it wrong, along with several other posters. We didn't account for nested tags. Use a parser. Use a DOM. It will work reliably, where regexes won't. (Regular expressions are GREAT for some things - but not for parsing HTML or XML.)
TrueWill
@TrueWill - Using Telligent Community 5, I've got an advanced text editor, that allows users to left or right align images. It does this by adding float to the img. I need for the image to not have margin on the side that is the same as the direction it is floated.I modified the RegEx to capture the entire image tag, and the float value then match/replace with <div class="image-$2">$1</div> after the content in the editor is saved.What options do I have other than running this regex before the content is committed to the db?
Kappers
+2  A: 

You really shouldn't use regex to parse html or xml, it's impossible to design a foolproof regex that will handle all corner cases. Instead, I would suggest finding an html-parsing library for your language of choice.

That said, here's a possible solution using regex.

<img\s[^>]*?style\s*=\s*".*?(?<"|;)(float:.*?)(?=;|").*?"

The "float:" will be captured in the only capturing group there, which should be number 1.

The regex basically matches the start of an img tag, followed by any type of character that isn't a close bracket any number of times, followed by the style attribute. Within the style attribute's value, the float: can be anywhere within the attribute, but it should only match the actual float style (i.e. it's preceded by the start of the attribute or a semicolon and followed by a semicolon or the end of the attribute).

Sean Nyman
A: 

Test this C# code:

            string[] test = new String[] { 
                "<img src=\"\\img.gif\" style=\"float:left; border:0\" />",
                "<img src=\"\\img.gif\" style=\"border:0; float:right\" />" 
            };
            Regex regex = new Regex(@"\<img[^>]*?style[\s]*?=.*?float:([\w]+).*?/\>", RegexOptions.Compiled);
            foreach (String s in test)
            {
                Match match = regex.Match(s);
                if (match.Success)
                {
                    Console.WriteLine(match.Groups[1].Value);
                }
            }
mykhaylo
A: 

I agree with Sean Nyman, it's best not to use a regex (at least not for anything permanent). For something ad-hoc and a bit more durable, you might try:

/<img\s(?:\s*\w+\s*=\s*(?:'[^']*'|"[^"]*"))*?\s*\bstyle\s*=\s*(?:"[^"]*?\bfloat\s*:\s*(\w+)|'[^']*?float\s*:\s*(\w+)/i
brianary