views:

42

answers:

1

Hello!

How can I get all the unclosed tags in a given string, prefferably in the order they should be closed?

Note: consider that there are no errors in the HTML and that it was just cut off after X characters. No it's not a case of bad html or overlapping tags etc. Also there will be no ending

Example: <p><span>Lorem</span><b>ipsum ---return---> </b></p>
-OR-
<ul><li>1</li><li>2 ---return---> </li></ul>

So that if the string is concatenated with the function output it will re-create a valid HTML.

I'm not sure if a RegExp would do the trick here, basically I want to get anything that's between < and > that does not have a matching </ > closing tag.

Thank you.

+3  A: 

This is not an easy task. You might want to look at Tidy:

Tidy is a binding for the Tidy HTML clean and repair utility which allows you to not only clean and otherwise manipulate HTML documents, but also traverse the document tree.

http://php.net/manual/en/book.tidy.php

NullUserException