views:

208

answers:

2

I'm contemplating the idea of implementing a XML translator using a compiler generator, based on the W3C's XML 1.1 spec, which includes a complete EBNF grammar.

More precisely, I plan to use Qi-YACC because I want to learn this tool. It will be my first foray into using any compiler-compiler.

The first kind of translation I'm planning to implement is very straightforward: XML to S-EXPRs. Afterwards, I plan to generalize my translator, but this is not the point of my question.

Do you anticipate any major pitfall for this kind of project? I've read that translating XML using its EBNF is a bad idea. I wonder why. And it's not like the Qi language already had a XML parser, so I'm definitely not looking to reinvent the wheel here.

+2  A: 

I do not now the reason why context is needed to parse XML.

But QiYacc can make use of context using global variables. It would be cleaner if you could pass a state, S, in the parser e.g. or something like that. This is not in Qi but I plan to implement such a feature for Shen.

So it could be done.

/Stefan

+1  A: 

I know nothing of QiYACC, however translating an EBNF of XML into a recursive descent (RD) parser is more or less straightforward. One just need to keep in mind the fact that there are places where some small tweaks to the grammar can have a big performance impact on the parser. This is because the grammars are written with succinctness and clarity in mind, rather than trying to avoid chasing down rules.

I did something like this once in C++ by writing the grammar of XML out as a set of types. You can see an article I wrote on it at Code Project. The same basic principles can be applied to any language.

I'd also suggest you look around at PEG grammars. They extend on EBNF by allowing you to introduce zero-width assertions, and are a great way to augment an EBNF grammar for a parser.

cdiggins