views:

44

answers:

2

hi there,

i need a reg exp (to do a preg_replace) to find all <font> tags with a style="..." attribute...

the problem is that i need to match ONLY the <font> with a style attribute AND a value of

"height: 0;overflow: hidden;width: 0; position: absolute;"...

another problem, the style attribute may be in different positions;

ex.

<font  color="white"  style="height: 0;overflow: hidden;width: 0; position: absolute; font-family:courier; font-size:10px" >

or

<font  style="height: 0;overflow: hidden;width: 0; position: absolute; font-family:tahoma; font-size:14px" color="red"   >

EDIT: solved it with:

#</?font [^>]*\bheight: 0;overflow: hidden;width: 0; position: absolute;[^>]* >(.+</font[^>]*>|)#is

(find the tag with that style and everything it contains)

that reg exp in a preg_replace() seems to work !!

+3  A: 

You can use the following XPath expression to get all <font> tags (assuming your HTML document is well formed):

//font[@style='height: 0;overflow: hidden;width: 0; position: absolute;']

In PHP there are many ways to run XPath expressions on documents, for example this one.

If your HTML isn't well formed, you can use an HTML parser such as this one that I just found. It supports JQuery-like selectors, so you'd find your element using this expression:

font[style*='height: 0;overflow: hidden;width: 0; position: absolute;']

I must warn you against using Jens' solution, as trying to parse HTML with regular expressions is a journey into the dark abyss of pure malevolent madness. HTML is a nested, recursive structure. By its very nature, regular expressions cannot deal with that kind of recursion. While you might be able to create an expression that looks like it works, there will most certainly be valid cases that slip through or cases that should not match that do. I implore you to use an actual DOM-based parser.

Welbog
I don't know how it compares to simplehtmldom, but PHP has HTML stuff built-in: http://php.net/manual/en/book.dom.php
Peter Boughton
+1  A: 

If you are sure that your HTML is nice enough to be accesible by Regex (i.e. no comments, nothing malformed, the style css does not contain comments, ... ) and want to only match the opening tag (nesting is a no-no with regex), you can try

<font [^>]*\bstyle="([^"]*)"[^>]*>

This regex matches every font-tag with a style attribute, and contains the value of this attribute in its only capturing group.

Edit: Maybe I misunderstood the question. If you need the style attribute to be the value you specified, use

<font [^>]*\bstyle="height: 0;overflow: hidden;width: 0; position: absolute;"[^>]*>
Jens
your solution seems to work, thanks !!!i know that i shouldn't use regexp to access html pages, but i need to remove those <font> and all they contain....so no need for nested controls !
Il_pasqui
Be aware that my regex does not find the corresponding closing tag, and this is no task that you can easily solve with regex (font tags may be nested). So you are likely to leave invalid html behind.
Jens
take a look at my edit...it seems to work now...
Il_pasqui
@Il_pasqui: I'd be careful and use a parser here. Your version could delete everthing between the first opening font tag (with this style) and the last closing font tag.
Jens