views:

159

answers:

2

I'm trying to implement parsing of CSS in JavaScript so that:

a {
  color: red;
}

is parsed into the object:

{
  'a' {
    'color': 'red'
  }
}

First off, is there a JavaScript / jQuery library I can use?

My implementation is pretty basic, so I'm sure it is not fool-proof by any means. For example, it works fine for basic CSS, but for a property of the type:

background: url(data:image/png;base64, ....);

It fails because I am using split(';') to separate property:value pairs. Here, ; occurs in the value, so it splits at that point too.

Is there an alternate way to do this?

Here is the code:

parseCSS: function(css) {
    var rules = {};
    css = this.removeComments(css);
    var blocks = css.split('}');
    blocks.pop();
    var len = blocks.length;
    for (var i = 0; i < len; i++)
    {
        var pair = blocks[i].split('{');
        rules[$.trim(pair[0])] = this.parseCSSBlock(pair[1]);
    }
    return rules;
},

parseCSSBlock: function(css) { 
    var rule = {};
    var declarations = css.split(';');
    declarations.pop();
    var len = declarations.length;
    for (var i = 0; i < len; i++)
    {
        var loc = declarations[i].indexOf(':');
        var property = $.trim(declarations[i].substring(0, loc));
        var value = $.trim(declarations[i].substring(loc + 1));

        if (property != "" && value != "")
            rule[property] = value;
    }
    return rule;
},

removeComments: function(css) {
    return css.replace(/\/\*(\r|\n|.)*\*\//g,"");
}

Thanks!

+1  A: 

To write the most fool-proof parser, follow the exact rules for tokenization and CSS grammar as defined in the spec. Note that you don't have to implement the spec by the ink. You can start with small parts and CSS that you will most likely encounter, and then expand from there. Even better, skip the entire process altogether and go with @Matthew's solution unless this is a learning exercise.

There are various lexical scanners and parser generators available for JavaScript. The entire grammar is available on w3's website. Why do the re-work when you can simply use that and the parser generators to generate the parser in JavaScript.

  1. Jison
  2. Peg.js
  3. Cruiser.Parse
  4. McLexer
  5. JS/CC

The production rules for CSS are given below.

stylesheet
  : [ CHARSET_SYM STRING ';' ]?
    [S|CDO|CDC]* [ import [ CDO S* | CDC S* ]* ]*
    [ [ ruleset | media | page ] [ CDO S* | CDC S* ]* ]*
  ;
import
  : IMPORT_SYM S*
    [STRING|URI] S* media_list? ';' S*
  ;
media
  : MEDIA_SYM S* media_list LBRACE S* ruleset* '}' S*
  ;
media_list
  : medium [ COMMA S* medium]*
  ;
medium
  : IDENT S*
  ;
page
  : PAGE_SYM S* pseudo_page?
    '{' S* declaration? [ ';' S* declaration? ]* '}' S*
  ;
pseudo_page
  : ':' IDENT S*
  ;
operator
  : '/' S* | ',' S*
  ;
combinator
  : '+' S*
  | '>' S*
  ;
unary_operator
  : '-' | '+'
  ;
property
  : IDENT S*
  ;
ruleset
  : selector [ ',' S* selector ]*
    '{' S* declaration? [ ';' S* declaration? ]* '}' S*
  ;
selector
  : simple_selector [ combinator selector | S+ [ combinator? selector ]? ]?
  ;
simple_selector
  : element_name [ HASH | class | attrib | pseudo ]*
  | [ HASH | class | attrib | pseudo ]+
  ;
class
  : '.' IDENT
  ;
element_name
  : IDENT | '*'
  ;
attrib
  : '[' S* IDENT S* [ [ '=' | INCLUDES | DASHMATCH ] S*
    [ IDENT | STRING ] S* ]? ']'
  ;
pseudo
  : ':' [ IDENT | FUNCTION S* [IDENT S*]? ')' ]
  ;
declaration
  : property ':' S* expr prio?
  ;
prio
  : IMPORTANT_SYM S*
  ;
expr
  : term [ operator? term ]*
  ;
term
  : unary_operator?
    [ NUMBER S* | PERCENTAGE S* | LENGTH S* | EMS S* | EXS S* | ANGLE S* |
      TIME S* | FREQ S* ]
  | STRING S* | IDENT S* | URI S* | hexcolor | function
  ;
function
  : FUNCTION S* expr ')' S*
  ;
/*
 * There is a constraint on the color that it must
 * have either 3 or 6 hex-digits (i.e., [0-9a-fA-F])
 * after the "#"; e.g., "#000" is OK, but "#abcd" is not.
 */
hexcolor
  : HASH S*
  ;
Anurag
+6  A: 

There is a CSS parser written in Javascript called JSCSSP

Matthew Manela
I did take a look at it earlier, but didn't want to use it as it is so **heavy**. It does a lot of stuff I don't need to do
ankit
@ankit: Then what are you asking for? If you want to parse *correctly* (meaning you can handle any arbitrary CSS), then you're going to end up with a "heavy" library. Otherwise, you can stick with your lightweight implementation knowing that it's easy to break.
josh3736
@josh3736 Looks like your comment was the push I required. I was worried about performance issues, but turns out it works pretty well!
ankit