I used xdiff for this to highlight changes in the text of an html page. The basic workflow was:
- escape all html entities
- split html tags onto their own lines (append \n after the closing >)
- split the resulting text on whitespace (eliminating duplicate whitespace)
- rejoin the results of the previous split with \n as the seperator, so now all tags and words are on seperate lines
- do the diff with
xdiff_string_diff()
- Patch up the diff output to highlight the additions/deletions with the appropriate tags
not particularly efficient, and very top-heavy on extra wrapping tags if you've got a long sequence of ads/deletions, but it did the job.