Hello,
I'm trying to write a simple function to close missing HTML tags using PHP preg_replace.
I thought it would be relatively straight-forward, but for some reason it hasn't been.
What I'm basically trying to do is close a missing tag in the following row:
<tr>
<th class="ProfileIndent0">
<p>Global pharmaceuticals</p>
<td>197.2</td>
<td>94</td>
</tr>
The approach I've been taking is to use a negative look behind to find opening td tags that are not preceded by opened th and properly closed th tags.
For example :
$text = preg_replace('!<th(\s\S*){0,1}?>(.*)((?<!<\/th>)[\s]*<td>)!U','<th$1>$2</th>',$text);
I've written the regular expression pattern countless different ways to no avail. The problem has been that I cannot seem to match on solely the one open td with the missing /th preceeding it - but rather it seems to match on several of the open td tags.
Here's the complete input text:
<CO_TEXT text_type_id="6">
<TEXT_DATA><![CDATA[<table class="ProfileChart"> <tr> <th class="TableHead" colspan="21">2008 Sales</th> </tr>
<tr> <th class="ProfileIndent0"></th> <th class="ProfileHead">$ mil.</th> <th class="ProfileHead">% of total</th> </tr>
<tr> <th class="ProfileIndent0"> <p>Global pharmaceuticals</p> <td>197.2</td> <td>94</td> </tr>
<tr> <th class="ProfileIndent0">Impax pharmaceuticals</th> <td>12.9</td> <td>6</td> </tr>
<tr> <th class="ProfileTotal">Total</th> <td class="ProfileDataTotal">210.1</td> <td class="ProfileDataTotal">100</td> </tr> </table><h3>Selected Generic Products</h3><ul class="prodoplist"><li>Anagrelide hydrochloride (generic Agrylin, thrombocytosis)</li><li>Bupropion hydr ochloride (generic Wellbutrin SR, depression)</li><li>Colestipol hydrochloride (generic Colestid, high cholesterol)</li><li>Dantrolene sodium (generic Dantrium, spasticity)</li><li>Metformin Hcl (generic Glucophage XR, diabetes)</li><li>Nadolol/Bendroflumethiazide (generic Corzide, hypertension)</li
><li>Oxybutynin chloride (generic Ditropan XL, urinary incontinence, with Teva)</li><li>Oxycodone hydrochloride (generic OxyContin controlled release, pain)</li><li>Pilocarpine hydrochlorine (generic Salagen, dry mouth caused by radiation therapy)</li></ul>]]></TEXT_DATA> </CO_TEXT>
Is there something going on with negative look behinds in PHP that I'm not aware of, or have I just not hit on the right matching pattern?
Any help would be much appreciated.
Thanks, John