tags:

views:

351

answers:

6
+13  Q: 

Inlining CSS in C#

I need to inline css from a stylesheet in c#.

Like how this works.

http://www.mailchimp.com/labs/inlinecss.php

The css is simple, just classes, no fancy selectors.

I was contemplating using a regex (?<rule>(?<selector>[^{}]+){(?<style>[^{}]+)})+ to strip the rules from the css, and then attempting to do simple string replaces where the classes are called, but some of the html elements already have a style tag, so I'd have to account for that as well.

Is there a simpler approach? Or something already written in c#?

UPDATE - Sep 16, 2010

I've been able to come up with a simple CSS inliner provided your html is also valid xml. It uses a regex to get all the styles in your <style /> element. Then converts the css selectors to xpath expressions, and adds the style inline to the matching elements, before any pre-existing inline style.

Note, that the CssToXpath is not fully implemented, there are some things it just can't do... yet.

CssInliner.cs

using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Xml.Linq;
using System.Xml.XPath;

namespace CssInliner
{
    public class CssInliner
    {
        private static Regex _matchStyles = new Regex("\\s*(?<rule>(?<selector>[^{}]+){(?<style>[^{}]+)})",
                                                RegexOptions.IgnoreCase
                                                | RegexOptions.CultureInvariant
                                                | RegexOptions.IgnorePatternWhitespace
                                                | RegexOptions.Compiled
                                            );

        public List<Match> Styles { get; private set; }
        public string InlinedXhtml { get; private set; }

        private XElement XhtmlDocument { get; set; }

        public CssInliner(string xhtml)
        {
            XhtmlDocument = ParseXhtml(xhtml);
            Styles = GetStyleMatches();

            foreach (var style in Styles)
            {
                if (!style.Success)
                    return;

                var cssSelector = style.Groups["selector"].Value.Trim();
                var xpathSelector = CssToXpath.Transform(cssSelector);
                var cssStyle = style.Groups["style"].Value.Trim();

                foreach (var element in XhtmlDocument.XPathSelectElements(xpathSelector))
                {
                    var inlineStyle = element.Attribute("style");

                    var newInlineStyle = cssStyle + ";";
                    if (inlineStyle != null && !string.IsNullOrEmpty(inlineStyle.Value))
                    {
                        newInlineStyle += inlineStyle.Value;
                    }

                    element.SetAttributeValue("style", newInlineStyle.Trim().NormalizeCharacter(';').NormalizeSpace());
                }
            }

            XhtmlDocument.Descendants("style").Remove();
            InlinedXhtml = XhtmlDocument.ToString();
        }

        private List<Match> GetStyleMatches()
        {
            var styles = new List<Match>();

            var styleElements = XhtmlDocument.Descendants("style");
            foreach (var styleElement in styleElements)
            {
                var matches = _matchStyles.Matches(styleElement.Value);

                foreach (Match match in matches)
                {
                    styles.Add(match);
                }
            }

            return styles;
        }

        private static XElement ParseXhtml(string xhtml)
        {
            return XElement.Parse(xhtml);
        }
    }
}

CssToXpath.cs

using System.Text.RegularExpressions;

namespace CssInliner
{
    public static class CssToXpath
    {
        public static string Transform(string css)
        {
            #region Translation Rules
            // References:  http://ejohn.org/blog/xpath-css-selectors/
            //              http://code.google.com/p/css2xpath/source/browse/trunk/src/css2xpath.js
            var regexReplaces = new[] {
                                          // add @ for attribs
                                          new RegexReplace {
                                              Regex = new Regex(@"\[([^\]~\$\*\^\|\!]+)(=[^\]]+)?\]", RegexOptions.Multiline),
                                              Replace = @"[@$1$2]"
                                          },
                                          //  multiple queries
                                          new RegexReplace {
                                              Regex = new Regex(@"\s*,\s*", RegexOptions.Multiline),
                                              Replace = @"|"
                                          },
                                          // , + ~ >
                                          new RegexReplace {
                                              Regex = new Regex(@"\s*(\+|~|>)\s*", RegexOptions.Multiline),
                                              Replace = @"$1"
                                          },
                                          //* ~ + >
                                          new RegexReplace {
                                              Regex = new Regex(@"([a-zA-Z0-9_\-\*])~([a-zA-Z0-9_\-\*])", RegexOptions.Multiline),
                                              Replace = @"$1/following-sibling::$2"
                                          },
                                          new RegexReplace {
                                              Regex = new Regex(@"([a-zA-Z0-9_\-\*])\+([a-zA-Z0-9_\-\*])", RegexOptions.Multiline),
                                              Replace = @"$1/following-sibling::*[1]/self::$2"
                                          },
                                          new RegexReplace {
                                              Regex = new Regex(@"([a-zA-Z0-9_\-\*])>([a-zA-Z0-9_\-\*])", RegexOptions.Multiline),
                                              Replace = @"$1/$2"
                                          },
                                          // all unescaped stuff escaped
                                          new RegexReplace {
                                              Regex = new Regex(@"\[([^=]+)=([^'|""][^\]]*)\]", RegexOptions.Multiline),
                                              Replace = @"[$1='$2']"
                                          },
                                          // all descendant or self to //
                                          new RegexReplace {
                                              Regex = new Regex(@"(^|[^a-zA-Z0-9_\-\*])(#|\.)([a-zA-Z0-9_\-]+)", RegexOptions.Multiline),
                                              Replace = @"$1*$2$3"
                                          },
                                          new RegexReplace {
                                              Regex = new Regex(@"([\>\+\|\~\,\s])([a-zA-Z\*]+)", RegexOptions.Multiline),
                                              Replace = @"$1//$2"
                                          },
                                          new RegexReplace {
                                              Regex = new Regex(@"\s+\/\/", RegexOptions.Multiline),
                                              Replace = @"//"
                                          },
                                          // :first-child
                                          new RegexReplace {
                                              Regex = new Regex(@"([a-zA-Z0-9_\-\*]+):first-child", RegexOptions.Multiline),
                                              Replace = @"*[1]/self::$1"
                                          },
                                          // :last-child
                                          new RegexReplace {
                                              Regex = new Regex(@"([a-zA-Z0-9_\-\*]+):last-child", RegexOptions.Multiline),
                                              Replace = @"$1[not(following-sibling::*)]"
                                          },
                                          // :only-child
                                          new RegexReplace {
                                              Regex = new Regex(@"([a-zA-Z0-9_\-\*]+):only-child", RegexOptions.Multiline),
                                              Replace = @"*[last()=1]/self::$1"
                                          },
                                          // :empty
                                          new RegexReplace {
                                              Regex = new Regex(@"([a-zA-Z0-9_\-\*]+):empty", RegexOptions.Multiline),
                                              Replace = @"$1[not(*) and not(normalize-space())]"
                                          },
                                          // |= attrib
                                          new RegexReplace {
                                              Regex = new Regex(@"\[([a-zA-Z0-9_\-]+)\|=([^\]]+)\]", RegexOptions.Multiline),
                                              Replace = @"[@$1=$2 or starts-with(@$1,concat($2,'-'))]"
                                          },
                                          // *= attrib
                                          new RegexReplace {
                                              Regex = new Regex(@"\[([a-zA-Z0-9_\-]+)\*=([^\]]+)\]", RegexOptions.Multiline),
                                              Replace = @"[contains(@$1,$2)]"
                                          },
                                          // ~= attrib
                                          new RegexReplace {
                                              Regex = new Regex(@"\[([a-zA-Z0-9_\-]+)~=([^\]]+)\]", RegexOptions.Multiline),
                                              Replace = @"[contains(concat(' ',normalize-space(@$1),' '),concat(' ',$2,' '))]"
                                          },
                                          // ^= attrib
                                          new RegexReplace {
                                              Regex = new Regex(@"\[([a-zA-Z0-9_\-]+)\^=([^\]]+)\]", RegexOptions.Multiline),
                                              Replace = @"[starts-with(@$1,$2)]"
                                          },
                                          // != attrib
                                          new RegexReplace {
                                              Regex = new Regex(@"\[([a-zA-Z0-9_\-]+)\!=([^\]]+)\]", RegexOptions.Multiline),
                                              Replace = @"[not(@$1) or @$1!=$2]"
                                          },
                                          // ids
                                          new RegexReplace {
                                              Regex = new Regex(@"#([a-zA-Z0-9_\-]+)", RegexOptions.Multiline),
                                              Replace = @"[@id='$1']"
                                          },
                                          // classes
                                          new RegexReplace {
                                              Regex = new Regex(@"\.([a-zA-Z0-9_\-]+)", RegexOptions.Multiline),
                                              Replace = @"[contains(concat(' ',normalize-space(@class),' '),' $1 ')]"
                                          },
                                          // normalize multiple filters
                                          new RegexReplace {
                                              Regex = new Regex(@"\]\[([^\]]+)", RegexOptions.Multiline),
                                              Replace = @" and ($1)"
                                          },

                                      };
            #endregion

            foreach (var regexReplace in regexReplaces)
            {
                css = regexReplace.Regex.Replace(css, regexReplace.Replace);
            }

            return "//" + css;
        }
    }

    struct RegexReplace
    {
        public Regex Regex;
        public string Replace;
    }
}

And some tests

    [TestMethod]
    public void TestCssToXpathRules()
    {
        var translations = new Dictionary<string, string>
                               {
                                   { "*", "//*" }, 
                                   { "p", "//p" }, 
                                   { "p > *", "//p/*" }, 
                                   { "#foo", "//*[@id='foo']" }, 
                                   { "*[title]", "//*[@title]" }, 
                                   { ".bar", "//*[contains(concat(' ',normalize-space(@class),' '),' bar ')]" }, 
                                   { "div#test .note span:first-child", "//div[@id='test']//*[contains(concat(' ',normalize-space(@class),' '),' note ')]//*[1]/self::span" }
                               };

        foreach (var translation in translations)
        {
            var expected = translation.Value;
            var result = CssInliner.CssToXpath.Transform(translation.Key);

            Assert.AreEqual(expected, result);
        }
    }

    [TestMethod]
    public void HtmlWithMultiLineClassStyleReturnsInline()
    {
        #region var html = ...
        var html = XElement.Parse(@"<html>
                                        <head>
                                            <title>Hello, World Page!</title>
                                            <style>
                                                .redClass { 
                                                    background: red; 
                                                    color: purple; 
                                                }
                                            </style>
                                        </head>
                                        <body>
                                            <div class=""redClass"">Hello, World!</div>
                                        </body>
                                    </html>").ToString();
        #endregion

        #region const string expected ...
        var expected = XElement.Parse(@"<html>
                                            <head>
                                                <title>Hello, World Page!</title>
                                            </head>
                                            <body>
                                                <div class=""redClass"" style=""background: red; color: purple;"">Hello, World!</div>
                                            </body>
                                        </html>").ToString();
        #endregion

        var result = new CssInliner.CssInliner(html);

        Assert.AreEqual(expected, result.InlinedXhtml);
    }

There are more tests, but, they import html files for the input and expected output and I'm not posting all that!

But I should post the Normalize extension methods!

private static readonly Regex NormalizeSpaceRegex = new Regex(@"\s{2,}", RegexOptions.None);
public static string NormalizeSpace(this string data)
{
    return NormalizeSpaceRegex.Replace(data, @" ");
}

public static string NormalizeCharacter(this string data, char character)
{
    var normalizeCharacterRegex = new Regex(character + "{2,}", RegexOptions.None);
    return normalizeCharacterRegex.Replace(data, character.ToString());
}
+3  A: 

Excellent question.

I have no idea if there is a .NET solution, but I found a Ruby program called Premailer that claims to inline CSS. If you want to use it you have a couple options:

  1. Rewrite Premailer in C# (or any .NET language you are familiar with)
  2. Use IronRuby to run Ruby in .NET
Greg
@Greg, found that too, I'd rewrite it if anything.
Chad
+1  A: 

Here is an idea, why dont you make a post call to http://www.mailchimp.com/labs/inlinecss.php using c#. from analysis using firebug it looks like the post call needs 2 params html and strip which takes values (on/off) the result is in a param called text.

here is a sample on how to make a post call using c#

Vinay B R
@Vinay B R, this app is to critical to rely on another site being online, fast, and not changing.
Chad
+1  A: 

Chad, do you necessarily have to add the CSS inline? Or could you maybe be better off by adding a <style> block to your <head>? This will in essence replace the need for a reference to a CSS file as well plus maintain the rule that the actual inline rules override the ones set in the header/referenced css file.

(sorry, forgot to add the quotes for code)

Waltes
@Waltes, Yes, the css is already in a `<style/>` element, but the BB email client doesn't appear to support it. It does however support inline styles.
Chad
In this particular scenario (a flat css structure without selectors (Except for maybe single . and #)) I would then advice to make a nested dictionary. The first with the element/class/id as key and a dictionary of the css-element as key and the css-value as value. After that I would make a run through your html... Wait... I'll add a new answer for it, to explain my idea better.
Waltes
+1  A: 

I would recommend a dictonary like this:

private Dictionary<string, Dictionary<string, string>> cssDictionary = new Dictionary<string, Dictionary<string, string>();

I would parse the css to fill this cssDictionary.

(Adding 'style-type', 'style-property', 'value'. In example:

Dictionary<string,string> bodyStyleDictionary = new Dictionary<string, string();
    bodyStyleDictionary.Add("background", "#000000");
    cssDictionary.Add("body", bodyStyleDictionary);

After that I would preferably convert the HTML to an XmlDocument.

You can recursively run through the documents nodes by it's children and also look up it's parents (This would even enable you being able to use selectors).

On each element you check for the element type, the id and the class. You then browse through the cssDictionary to add any styles for this element to the style attribute (Granted, you might want to place them in order of occurrence if they have overlapping properties (And add the existing inline styles the last).

When you're done, you emit the xmlDocument as a string and remove the first line (<?xml version="1.0"?>) This should leave you with a valid html document with inline css.

Sure, it might half look like a hack, but in the end I think it's a pretty solid solution that ensures stability and quite does what you seem to be looking for.

Waltes
@Waltes, I actually have something similar written that's working pretty good. It's far from flawless, but it's working pretty good. I'll try and package it up and post it here.
Chad
+4  A: 

Since you're already 90% of the way there with your current implementation, why don't you use your existing framework but replace the XML parsing with an HTML parser instead? One of the more popular ones out there is the HTML Agility Pack. It supports XPath queries and even has a LINQ interface similar to the standard .NET interface provided for XML so it should be a fairly straightforward replacement.

Richard Cook
@Richard Cook, if I run into any issues with the current implementation I'll look into that. Right now, I know my "html" is valid xml, so xml parsing is fine, and it's working in production.
Chad
Also, the HTML Agility Pack's parser doesn't deal well even with valid HTML such as `<ul><li>x<li>y<li>z</ul>` - it's just not robust enough to deal with real-world HTML.
Eamon Nerbonne
Check out this thread (http://stackoverflow.com/questions/100358/looking-for-c-html-parser) for discussion of other HTML parser as well as the HTML Agility Pack.
Richard Cook
+3  A: 

I'd recommend using an actual CSS parser rather than Regexes. You don't need to parse the full language since you're interested mostly in reproduction, but in any case such parsers are available (and for .NET too). For example, take a look at antlr's list of grammars, specifically a CSS 2.1 grammar or a CSS3 grammar. You can possibly strip large parts of both grammars if you don't mind sub-optimal results wherein inline styles may include duplicate definitions, but to do this well, you'll need some idea of internal CSS logic to be able to resolve shorthand attributes.

In the long run, however, this will certainly be a lot less work than a neverending series of adhoc regex fixes.

Eamon Nerbonne
@Eamon Nerbonne, I'll likely do this if it becomes an issue.
Chad