tags:

views:

503

answers:

2

I'm new to Treetop and attempting to write a CSS/HSS parser. HSS augments the basic functionality of CSS with nested styles, variables and a kind of mixin functionality.

I'm pretty close - the parser can handle CSS - but I fall down when it comes to implementing a style within a style. e.g:

#rule #one {
  #two {
    color: red;
  }
  color: blue;
}

I've taken two shots at it, one which handles whitespace and one which doesn't. I can't quite get either to work. The treetop documentation is a little sparse and I really feel like I'm missing something fundamental. Hopefully someone can set me straight.

A:

 grammar Stylesheet

      rule stylesheet
        space* style*
      end

      rule style
        selectors space* '{' space* properties? space* '}' space*
      end

      rule properties
        property space* (';' space* property)* ';'?
      end

      rule property
        property_name space* [:] space* property_value
      end

      rule property_name
        [^:;}]+
      end

      rule property_value
        [^:;}]+
      end

      rule space
        [\t ]
      end

      rule selectors
        selector space* ([,] space* selector)*
      end

      rule selector
        element (space+ ![{] element)*
      end

      rule element
        class / id
      end

      rule id
        [#] [a-zA-Z-]+
      end

      rule class
       [.] [a-zA-Z-]+
      end
end

B:

grammar Stylesheet

  rule stylesheet
   style*
  end

  rule style
    selectors closure
  end

  rule closure
    '{' ( style / property )* '}'
  end

  rule property
    property_name ':' property_value ';'
  end

  rule property_name
    [^:}]+
    <PropertyNode>
  end

  rule property_value
    [^;]+
    <PropertyNode>
  end

  rule selectors
    selector ( !closure ',' selector )*
    <SelectorNode>
  end

  rule selector
    element ( space+ !closure element )*
    <SelectorNode>
  end

  rule element
    class / id
  end

  rule id
    ('#' [a-zA-Z]+)
  end

  rule class
    ('.' [a-zA-Z]+)
  end

  rule space
    [\t ]
  end

end

Harness Code:

require 'rubygems'
require 'treetop'

class PropertyNode < Treetop::Runtime::SyntaxNode
  def value
    "property:(#{text_value})"
  end
end

class SelectorNode < Treetop::Runtime::SyntaxNode
  def value
    "--> #{text_value}"
  end
end

Treetop.load('css')

parser = StylesheetParser.new
parser.consume_all_input = false

string = <<EOS
#hello-there .my-friend {
  font-family:Verdana;
  font-size:12px;
}
.my-friend, #is-cool {
  font: 12px Verdana;
  #he .likes-jam, #very-much {asaads:there;}
  hello: there;
}
EOS

root_node = parser.parse(string)

def print_node(node, output = [])
  output << node.value if node.respond_to?(:value)
  node.elements.each {|element| print_node(element, output)} if node.elements
  output
end

puts print_node(root_node).join("\n") if root_node

#puts parser.methods.sort.join(',')
puts parser.input
puts string[0...parser.failure_index] + '<--'
puts parser.failure_reason
puts parser.terminal_failures
+3  A: 

I assume you're running into left recursion problems? If so, keep in mind that TreeTop produces recursive descent parsers, and as such, you can't really use left recursion in your grammar. (One of the main reasons I still prefer ocamlyacc/ocamllex over TreeTop despite its very sexy appearance.) This means you need to convert from left recursive forms to right recursion. Since you undoubtedly own the Dragon Book (right?), I'll direct you to sections 4.3.3, 4.3.4, and 4.4.1 which cover the issue. As is typical, it's hard-to-understand, but parsers didn't get their reputation for nothing. There's also a nice left recursion elimination tutorial that the ANTLR guys put up on the subject. It's somewhat ANTLR/ANTLRworks specific, but it's slightly easier to understand than what's found in the Dragon Book. This is one of those things that just doesn't ever make a whole lot of sense to anyone who hasn't done it at least a few times before.

Also, minor comment, if you're going to use TreeTop, I recommend doing this instead:

def ws
  [\t ]*
end

You're not likely to ever need to match a single whitespace character, plus almost every grammar rule is going to need it, so it makes sense to name it something very short. Incidentally, there are advantages to a separate lexing step. This is one of them.

Bob Aman
Ah, right! I assumed TreeTop would be able to handle left recursion and I was just missing something in the documentation. Thanks very much for taking the time to confirm this.
toothygoose
A: 

Looks like someone beat me to it:

http://lesscss.org/

Although I notice that they use regular expressions and an eval() to parse the input file rather than a parser.

Edit: Now they use TreeTop! It's like someone did all the hard work for me.

toothygoose