views:

244

answers:

5

I am writing support software and I figured for highlighting stuff it would be great to have HTML support.

Looking at Outlooks "HTML" I want to crawl up into the fetal position and cry!

Is there a php class to unscramble HTML emails to support basic HTML? I don't want to display the E-Mails in a frame because I want to work with the data and analyse it. I also don't want to support stupid things like changing font since its a webapp I want my webapp to say what the font is and not have some hippie who sends the support team e-mails in comic sans and yellow color. I want to support bold, italic, underlined, streched out and lists (http://dl.getdropbox.com/u/5910/Jing/2009-02-23_2100.png).

I also don't quite know the difference between rich-text and html since I always thought rich-text only allowed the functions I wanted but I seem to be able to do everything in rich-text which I can do in Html.

Also I should add I am using the Zend Framework because of the fabulous Zend_Mail

A: 

I'm pretty sure you'll have to write your own class... there is no real class like that in the PHP documents I've seen..

Earlz
I was thinking of a class someone has written since I would have to look at what shitty things all the major guys do.. Outlook, Apple Mail, Windows Live Mail, gmx, gmail ....
Thomaschaaf
+1  A: 

Pulling out the HTML from an Outlook mail may seem scary at first, but it's only HTML tags - just a whole lot of them!

So if you just locate to a "<" and then find the next ">" you have a tag. If it is not something you want to have, like "</strong>" just throw it away and repeat Simple as that.

(I have done exactly this in a spelling and grammar checker which not only pulls out plain text from Outlook and checks it - it can then push all the user's changes back into the HTML without destroying any tags. The latter was not easy, though! ;-)

danbystrom
A: 

Or you could use the plain-text variant attached to the e-mail. If there is no plain-text variant you could use a stripped version of the html. I think using these steps you would have a nice result:

  1. Remove newlines
  2. Turn </p> and <br/> into newline
  3. Strip all html tags
bouke
would not give me the possibility to use bold, italic, underlined, streched out and lists.
Thomaschaaf
+1  A: 

You can pipe it through htmltidy and then further filter it with something like HtmlPurifier, but of course you may strip out something that is essential to understanding the contents. That's the problem with a visual format, like html.

troelskn
+1  A: 

You can use PHP's strip_tags() function, and it's optional "allowable_tags" parameter. This will allow you to strip out all the tags that are not <em> <b> <strong> <u> etc.

About RTF vs. HTML, my understanding is that when Outlook and Exchange communicate with non-RTF compliant systems they convert RTF to HTML. I'm not sure this is always true, or how consistent that function is, but that might explain why messages sent RTF appear to be HTML.

acrosman
Not so much for my lists :( http://dl.getdropbox.com/u/5910/Jing/2009-02-23_2100.png FU outlook! (sorry)
Thomaschaaf
strip_tags is inadequate, if you don't trust the user (Which may or may not be the case here). Use HtmlPurifier instead.
troelskn
HTMLPurifier looks delicious! Just have to get it to work with the Zend Framework :)
Thomaschaaf
@Thomaschaaf, this is seriously ugly. If you're only worried about ul's you might be able to replace them all first since the code is at least consistent, ol's are probably also ugly but consistent, but you're probably back to writing your own library.
acrosman