First off, NEVER USE REGEX TO PARSE HTML..
But to solve your problem, look at the flags for preg_split()
preg_split(
":(</?word>):is",
$html,
-1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY
);
Now, it'll split them and give you this:
array(7) {
[0]=>
string(6) "<word>"
[1]=>
string(4) "test"
[2]=>
string(7) "</word>"
[3]=>
string(2) ", "
[4]=>
string(6) "<word>"
[5]=>
string(5) "test2"
[6]=>
string(7) "</word>"
}
Still no good. But, what we can do, is loop over the array, and move <word>
to the next buffer, and </word>
to the prior...
$buffer = '';
$newWords = array();
foreach ($words as $word) {
if (strcasecmp($word, '<word>') === 0) {
$buffer .= $word;
} elseif (strcasecmp($word, '</word>') === 0) {
// Find the last buffer
$last = end($newWords);
$newWords[key($newWords)] = $last . $buffer . $word;
$buffer = '';
} else {
$newWords[] = $buffer . $word;
$buffer = '';
}
}
if (!empty($buffer)) {
$newWords[] = $buffer;
}
Which would give you:
array(3) {
[0]=>
string(17) "<word>test</word>"
[1]=>
string(2) ", "
[2]=>
string(18) "<word>test2</word>"
}