views:

963

answers:

3

Hi All,

Right, in short we basically already have a system in place where the HTML content for emails is generated. It's not perfect, but it works.

From this, we need to be able to derive a plaintext alternative for the email. I was thinking of instantly jumping on and creating a RegEx to strip the <*> tags from the message - but then I realised this would be no good because we do need some of the formatting information (paragraphs, line breaks, images etc).

NOTE: I am OK with actually sending the mail and setting up alternative views etc, this is only about getting plaintext from HTML.

So, I am pondering some ideas. Will post one as an answer to see what you guys think, but thought I would open it up to the floor. :)

If you need any more clarification then please shout.

Many thanks,

Rob

A: 

My Idea

Create a page based on the HTML content and traverse the control tree. You can then pick the text from the controls and handle different controls as required (e.g. use ALT text for images, "_" for HR etc).

Rob Cooper
A: 

You could ensure the HTML mail is in XHTML format so you can parse it easily using the standard XML tools, then create your own DOM serialiser that outputs plain text. It'd still be a lot of work to cover general XHTML, but for a limited subset you plan to use in e-mail it could work.

Alternatively, if you don't mind shelling out to another program, you could just use the -dump switch to the lynx web browser.

bobince
+1  A: 

My Solution

OK, so here it is! I thought up a solution to my problem and it works like a charm!

Now, here are some of the goals I wanted to set out:

  • All the content for the emails should remain in the ASPX pages (as the HTML content currently does).
  • I didn't want the client code to do anything more other than say "SendMail("PageX.aspx")".
  • I didn't want to write too much code.
  • I wanted to keep the code as semantically correct as possible (no REALLY crazy-ass hacks!).

The Process

So, this is what I ended up doing:

  • Go to the master page for the email messages. Create an ASP.NET MultiView Control. This control would have two views - HTML and PlainText.
  • Within each view, I added content placeholders for the actual content.
  • I then grabbed all the existing ASPX code (such as header and footer) and stuck it in the HTML View. All of it, DocType and everything. This does cause VS to whinge a little bit. Ignore It.
  • I then of course added new content to the PlainText view to best replicate the HTML view in a PlainText environment.
  • I then added some code to the Master Page_Load, checking for the QueryString parameter "type" which could be either "html" or "text". It falls over to "text" if none present. Dependant on the value, it switches the view.
  • I then go to the content pages and add new placeholders for the PlainText equivalents and add text as required.
  • To make my life easier, I then overloaded my SendMail method to get the response for the required page, passing "type=html" and "type=text" and creating AlternateView's as appropriate.

In Summary

So, in short:

  • The Views seperate the actual "views" of the content (HTML and Text).
  • A master page auto switches the view based on a QueryString.
  • Content pages are responsible for how their views look.

Job done!

If any of this is unclear then please shout. I would like to create blog post on this at some point in more detail.

Rob Cooper