ansaurus

Question

Regular expression HTML tags in an array

Answer 1

+1 A:

You might want to look into Stream Wrappers or SimpleXML to process the HTML. Also, it would be helpful to know a little bit more about what you are trying to achieve. Why do you want to split the XHTML? To me this sounds like you are using an approach not really fitting the usecase.

Edit After reading your comments, I don't think this is something you should try to solve on the markup level. It's all about presentation. Check these articles about multi column layouts at quirksmode, alistapart and cvwdesign.

Gordon 2009-11-13 16:13:42

It's a function to split a string of user generated XHTML into chunks - the reason I'm looking for opening and closing tags is because I don't want to split the chunks in the middle of an open XHTML tag because it would obviously break down.The $words array is generated using$words = preg_split('/\s/', $stringfromdb, -1, PREG_SPLIT_NO_EMPTY);

Wil 2009-11-13 16:22:10

But why do you want to split the XHTML into chunks at all?

Gordon 2009-11-13 16:32:32

To break it into columns basically. The function will find convenient points in the XHTML to close a previous column div and open a new one. The column lengths are determined either by percentage length or specific word count.

Wil 2009-11-13 17:25:48

Answer 2

A:

Exploding the xHTML into a large array like this means you've made life harder for yourself because your hierarchy has disappeared. I think you should re-think this approach.

Maybe use a regex first to extract whole tags into an array for splitting?

Update

An example capturing pattern you could iterate over a document to work outside in (possibly).

<([^<>]+)>.*?</\1>

See the examples from the manual about escaping this pattern correctly. More information on capturing HTML tags with regex's.

Greg K 2009-11-13 16:18:19

That would work because I could happily just keep adding the resulting tags and tag contents into chunks until they were full.Perhaps then I just need an expression to find XHTML tags and return a) the tag and b) the contents, then I can go from there. Any advice for that?

Wil 2009-11-13 16:27:21

Answer 3

A:

Since an XHTML document has a root node, splitting anywhere inside will at least split that root node.

If your input consists of individual XHTML nodes without a root node, a regular expression is still the wrong way to achieve what you want to do, because XHTML is not a regular language.

The proper tool is an XHTML or XML parser. If you don't find one that doesn't assume that the whole document is in one root node, you can write one yourself---that's not too hard, since XML is designed to be easily parsable.

Svante 2009-11-13 16:50:05

It isn't an entire document, simply a body of formatted user generated content. There will only really be p, ul, blockquote etc. type tags in there.

Wil 2009-11-13 17:22:58

ansaurus

tags:

views:

answers:

Regular expression HTML tags in an array

related questions