views:

339

answers:

2

I'm writing a parser to parse CSS.

I started by modifying the CSS reference grammar, to use whichever grammar and lexer syntax are supported by the 3rd-party parser generator tool which I'm using.

I think that I've finished coding the grammar: the parser-generator is able now to generate state transition tables for/from my grammar.

The result (the output from the parser-generator) is approximately 116 "rules", which correspond to 116 cases in a switch statement. Examples of these rules/switch statements are:

  1. Stylesheet begins with specifying a charset
  2. Stylesheet begins without specifying a charset:
  3. Stylesheet is empty
  4. Stylesheet begins with whitespace
  5. ...etc...

The parser-generator has done all it can for me, and now I'm begining to write (by hand) the various cases of the switch statements, which will build what I think people call an 'abstract syntax tree'.

My question is about how to test this. I think that what I want is a set of CSS files which exercise the various combination and possibilities: e.g. one CSS file which specifies a charset; another file which doesn't specify a charset; etc.

  • Is there general a way to auto-generate this set of input data, for an arbitrary grammar or set of rules?

  • Alternatively, is there a set of specifically CSS files, whose purpose is to cover the combination and possibilities allowed by the standard CSS grammar?

Feel free to comment too if I'm going about this all wrong.

At the moment I don't need:

  • Files to test handling of illegal input (i.e. of files which don't conform to the grammar)

  • Testing of how various browsers render based on their parsing of CSS

+4  A: 

Microsoft made a set of many thousands of CSS tests for IE8 compliance with the CSS spec. http://samples.msdn.microsoft.com/ietestcenter/css.htm

While they are focused on testing browser compliance, possibly you could adapt them.

There are also the older W3C test suites, which are not as complete, but might serve your purpose: http://www.w3.org/Style/CSS/Test/

Joeri Sebrechts
+1  A: 

A context free grammar implicitly proposes an infinite set of (parse) trees. Each proposed tree has a set of leaves which make a concrete sentence in the language accepted by that grammar. By exploring the the set of proposed trees (e.g, by expanding each nonterminal according to it possible alternatives), you can generate any arbitrary instance of the language. You can generate a set of tests by walking the tree proposals and making random choices. A more focused approach would be to use iterative deepening search to generate sentences ordered by size. With any interesting grammer, you're likely to get a huge number of instances, but hey, that's what automated testing is for.

What I wouldn't do is generate such sentences from your production grammar, because the sentences you generate will be, by definition, the ones it accepts :-{ What you should do is construct your sentence generator using the reference grammar, to exploit the fact that you what it accepts and what you've implemented might be different.

Ira Baxter
With my grammar I have about 55 non-terminals. If I evaluate it as a top-down parser, then the method associated with the top-level non-terminal invokes methods associated with lower-level non-terminals, and so on down. Each non-terminal method invokes approximately 1 to 3 lower-level methods (via a `switch` statement within the method), and is invoked by 1 or sometimes 2 different higer-level methods. It would be something to even just get full code coverage: to ensure that every possibility is tested at least once (not hoping for every combination of every possibility), ...
ChrisW
... which might take on the order of (55 x 3 =) 150 test cases. I think I agree with you that there's little to be gained in generating these test cases automatically from the grammar. I asked however because I was never formally told about parsing in school, whereas other people were: and I wondered whether you were taught whether there's some well-known algorithm[s] for testing parsers.
ChrisW
Nobody every taught me how to test a parser, in school or out. Just not in the curriculum. I build lots of them, but mostly rely on having zillions of test cases from real code, because the reference langauge for most compilers doesn't match what the compiler guys really implemented (witness Microsoft and C#; they went through the trouble to get a standard, and didn't implement it!). You appear to be lucky, in having a real reference grammer.
Ira Baxter
... regarding lots of test cases, you can simply stop generating them when you have "enough" if if you don't get complete coverage. If you use the iterative deepening scheme, you can stop with a million test cases and be pretty sure have good coverage :-} It won't take that long to run them.
Ira Baxter