views:

130

answers:

3

I have a definition for a SPAN file (http://www.cme-ch.com/span/spanl300.htm) that i'd like to use in constructing a parser to parse the string data into an in memory collection class (or even using lazy evalution with the yield keyword.)

All parsing techniques and libraries i've seen apply to constructing parse trees for implementing languages; i'd simply like to know of any good techniques to parse into a data structure, similar to how XML is parsed into an XMLDocument in the .net framework, but using the rules defined by SPAN.

+2  A: 

SPAN appears to be a bunch of record types, each record with a lot of detail.

It should be straightforward to define a classic grammar that covers all of the records (as nonterminals), in terms of any subrecords (as nonterminals) and terminal data types representing the various datatypes defined by SPAN. There might be a lot of nonterminals, but that just makes for a big grammar, but not a complicated one.

Most programming languages have a small set of terminal tokens that can generally appear anywhere. The truth is that grammars define expectations of what can appear next (called "first" and "follow" sets in the LR parser literature), including a very limited set of terminals. A SPAN grammar would not be different; each "parse state" of a parser implies a limited set of terminals that come next, and one organize a parser to take advantage of this. (I've built L(AL)R parsers, and one could easily use the "current" state to determine the subset of terminals that could happen next). So, a SPAN parser could determine just the small set of tokens that might occur next in each state, and use that to pick of the characaters comprising those next tokens (they must form disjoint sets!).

An easy way to implement this is with a recursive descent parser.

So I claim that all that parsing machinery would be just fine for parsing SPAN, with some bit of custom work possibly to pick up the tokens.

Parsing actions for conventional parsers build trees, but its just as easy to populate fields of a data structure.

Ira Baxter
+1  A: 

Investigate the Gardens Point Parser Generator, an app that generates a C# parser for any language given a YACC-like language definition.

Dour High Arch
+2  A: 

Recursive decent is a pretty simple approach for things like this.

You start off with a wrapper to the underlying stream that lets you read one character (or possibly card/record in your case).

You then write a series of functions that do things like 'read a number, parse it' and 'read a character, and check that it is X'.

These functions either succeed and advance the stream or fail with a parse exception.

Finally, it is handy to make a set of combinators that take the above functions and combine them, for example 'read A, then read B' or 'read A, and if that fails try B instead'.

Phil