ansaurus

Question

PHP split content when a HTML element is found

Answer 1

A:

Something like this would basically work:

preg_split('/(<strong>|<b>)/', $html1, 3, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

Given your test string of:

$html1 = '<strong>My content</strong>This is my content.<b>Some more bold</b>content';

you'd end up with

Array (
    [0] => <strong>
    [1] => My content</strong>This is my content.
    [2] => <b>
    [3] => Some more bold</b>content
)

Now, if your sample string did NOT start with strong/b:

$html2 = 'like the first, but <strong>My content</strong>This is my content.<b>Some more bold</b>content, has some initial none-tag content';

Array (
    [0] => like the first, but 
    [1] => <strong>
    [2] => My content</strong>This is my content.
    [3] => <b>
    [4] => Some more bold</b>content, has some initial none-tag content
)

and a simple test to see if element #0 is either a tag or text to determine where your "second tag and onwards" text starts (element #3 or element #4)

Marc B 2010-05-27 12:02:59

would it be possible to place the splits in their own `div`

sea_1987 2010-05-27 12:16:51

Answer 2

A:

It is possible with 'positive lookbehind' in regular expressions. E.g., (?<=a)b matches the b (and only the b) in cab, but does not match bed or debt.

In your case, (?<=(\<strong|\<b)).*(\<strong|\<b) should do the trick. Use this regex in a preg_split() call and make sure to set PREG_SPLIT_DELIM_CAPTURE if you want those tags <b> or <strong> to be included.

jschulenklopper 2010-05-27 12:04:53

Answer 3

A:

If you truly really need to split the string, the regular expression approach might work. There are many fragilities about parsing HTML, though.

If you just want to know the second node that has either a strong or b tag, using a DOM is so much easier. Not only is the code very obvious, all the parsing bits are taken care of for you.

<?php

$testHtml = '<p><strong>My content</strong><br>
This is my content. <strong>Some more bold</strong> content, that would spilt into another variable.</p>
<p><b>This should not be found</b></p>';

$htmlDocument = new DOMDocument;

if ($htmlDocument->loadHTML($testHtml) === false) {
  // crash and burn
  die();
}

$xPath = new DOMXPath($htmlDocument);
$boldNodes = $xPath->query('//strong | //b');

$secondNodeIndex = 1;

if ($boldNodes->item($secondNodeIndex) !== null) {
  $secondNode = $boldNodes->item($secondNodeIndex);
  var_dump($secondNode->nodeValue);
} else {
  // crash and burn
}

erisco 2010-05-27 12:25:07

ansaurus

tags:

views:

answers:

PHP split content when a HTML element is found

related questions