A: 

Yea that looks like a mess, have you looked into or tried something like htmlPurifier?

There are a few others, but I do not know them as the only one I ever used was the htmlPurifier, but you may want to look into that (if that is what you are asking for).

Brad F Jacobs
Maybe, anything to help clean this up, lol. I'm looking into it now.
Bob Cavezza
A: 

You can use tidy to repair your HTML. But it looks very bad so you should start with fixing the script that produces the HTML before.

On a windows machine you might have to add or uncomment the following line in your php.ini to be able to use it:

extension=php_tidy.dll

Some very basic example from the documentation:

$html = '<p>test</I>';

$tidy = tidy_parse_string($html);
$tidy->cleanRepair();

echo $tidy;

This will output the following:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
  <head>
    <title></title>
  </head>
  <body>
    <p>test</p>
  </body>
</html>
Kau-Boy
Can you send me some resources or pages with sample implementations of tidy with php?
Bob Cavezza
My tidy converts this `style` mess to: `<li style="list-style-position:" list-style-image:="" margin-bottom:="">`. There are limits even to tidy :(
Wrikken
I take out some of the quotation marks prior to inserting into the database - perhaps if I perform this before changing the original code it may help a bit.
Bob Cavezza
@Bob: erm, indeed..... why would you remove the quotes prior to inserting it into the database? That only leads to (this kind of) trouble....
Wrikken
A bad case of using scotch tape as a band aid because I was too lazy to go to the store
Bob Cavezza
A: 

It is not that badly formed. Just call quoted_printable_decode() on it first.

edit: well, it solves a few problems, but it is still misformed as *********. Whatever possessed them not to quote whole lists of style declarations?

edit2: Ah, Bob removed the quotes all on his own. I assume with leaving the quotes there & quoted printable decode it would be solved.

Wrikken
This returns some interesting results - is there another function I should also use on it to supplement? It does most of the job but here's one example <div styleşckground: #99ca3c;>
Bob Cavezza
@Bob: it will return utf8 afaik, you may need to user `utf8_decode()` or the `iconv()` library if that's not the character set you're working in.
Wrikken
Perfect - I'm going to store these in a different way (had issues storing the files earlier) and then use this code - hopefully this will solve my issues - thanks!
Bob Cavezza
@Wrikken - I was able to fix this based on the errors before, but it's still not passing inspection through the php document - any ideas? - note - new email reformed above (with your help, thanks again!)
Bob Cavezza
Wrikken