views:

85

answers:

2

So let's say that I have:

<any_html_element>maybe some whitespaces <br/>Some text</any_html_element>

And I want to remove the first <br/> after <any_html_element>.

How can I do that?

Best Regards

+8  A: 

Start by not using RegEx, but a HTML parser, to identify the block of code you want to manipulate.

Once you've isolated the actual code, you can then do a replace to remove the <br/>.


Here are a couple of PHP HTML parser links to investigate:

Peter Boughton
How about a little actual code showing how Uffo could better solve his problem with one of your recommended HTML parsers than with a regex? (It may very well be the case that Uffo should use an HTML parser. That depends on whatever else he's trying to do with his HTML.)
Jan Goyvaerts
An example would be better coming from an active PHP developer - I rarely use it these days, and those links just came from Google - I added this answer to be more helpful than just the comment I left on the question, but expecting that a PHP developer would come along with something more thorough.
Peter Boughton
+2  A: 

Search for this regex:

(<any_html_element>.*?)</br>

and replace with:

$1

Turn on single-line mode if there may be line breaks between the two tags. You can do that with /s in PHP.

If with any_html_element you meant that you want to allow any element, use this regex:

(<\w[^<>]+>.*?)</br>

The replacement text remains the same.

While it's true that you can't parse HTML with just one regex, Uffo isn't trying to parse HTML. He just wants to delete one tag. A regex will do that just fine.

Jan Goyvaerts