I've split a large body of XHTML into individual array elements, and I now need to iterate through them and split it at regular intervals. That's not a problem, but I want to ensure I don't split it in the middle of an XHTML tag. So the array looks like:
[41] => <p>
[42] => materials
[43] => and
[44] => dosage
[45] => forms:</p>
[46] => <ul>
[47] => <li>
[48] => Drug
[49] => substance:
[50] => small
[51] => and
[52] => biomolecule</li>
[53] => <li>
[54] => Excipients</li>
[55] => <li>
[56] => Solid
[57] => oral
[58] => dosages</li>
So if I wanted to split the array at key point 50, I would be splitting an unordered list in 2 which is no good.
I'd like to iterate through and find all start and end points for tags, bearing in mind that an unordered list could be nested with several others.
Here's what I've got so far (granted it's a little messy)
// Find all xhtml tags
$pattern_to_find_opening_tag = "?????";
$pattern_to_find_closing_tag = "?????";
$tags = array(); $i=0;
foreach ($words as $key => $word)
{
// If we find an opening tag, add it to the array
if ( preg_match($pattern_to_find_opening_tag,$word,$matches) )
{
// The opened and closed keys represent the tags position in the words array
$tags[$i]['tag'] = $matches[0];
$tags[$i]['opened'] = $key;
$tags[$i]['closed'] = false;
$i++;
}
// If we find a closing tag, find it's opening position
elseif ( preg_match($pattern_to_find_closing_tag,$word,$matches) )
{
// Start from the top
$top_down_tags = array_reverse($tags);
foreach ($top_down_tags as $tag_key => $tag)
{
// Find the next opened tag with no closing point
if ($tag['tag'] == $matches[0] && !$tag['closed']) $tags[$tag_key]['closed'] = $key;
}
}
}
The chances are, I'm way off the mark with this being fairly unaccustomed to regex, so I'd appreciate any help whatsoever! Thanks guys & gals.