tags:

views:

371

answers:

4

I'm trying to create a small app that takes a base text template with specially tagged word arrays, parses the template contents and outputs a randomly generated text document.

Essentially, what I'm trying to do is take this:

<{Hello|Hi|Howdy}> world.

and turn it into this:

 Hello world.
OR
 Hi world.
OR
 Howdy world.

So far, so good. Googling got me enough to be able to successfully extract the inner text between the <{ and }> into an array, from which I then randomly select a word to replace the full <{Hello|Hi|Howdy}>.

The problem I'm having is parsing a nested set of words wrapped in the same tags.

For example, if I start with this:

<{Hello|Hi|Howdy}> world. <{How's <{life|it going}>?|How are you?}>

I'd like to turn it into this:

 Hello world. How's life?
OR
 Hello world. How's it going?
OR
 Hello world. How are you?

and so on...

Could someone suggest a way to do this fairly simply using c# and regex?

I've looked at http://www.vsj.co.uk/articles/display.asp?id=789 and http://www.m-8.dk/resources/RegEx-balancing-group.aspx, and to be honest, a lot of that goes way over my head, so something simple would be nice. ;-)

Thank you.

A: 

There is lex and yacc in the Visual Studio SDK:

These links might help:

http://msdn.microsoft.com/en-us/library/bb165963(VS.80).aspx

http://devhawk.net/2006/09/17/Managed+Lex+And+Yacc.aspx

Depending on how complex your parsing is going to be (considering possible future changes and additions) however you may just want to stick with Regex.

Jonathan Parker
A: 

If you have currently have a regex that can correctly parse the values inside your tag into an array (call it A'), then for each value in A', reapply that regex.

You should be able to do this recursively.

Alan
Unfortunately the grammar described above is not regular and thus you can not use regex's. You need the production S --> aBa, therefore you need a parser for context-free languages.
scurial
I tried this and got it to work. Probably better ways to do it.The trick was to create a class that uses the input string and pattern, find a match and return the inner text array, parse that using a string replace and pass it into the class again recursively until no more regex matches.Thank you.
Will
A: 

This problem is not well suited to for regular expressions. The grammar needed to recognize the expression you described is not a regular grammar.

The expressions described above however can be described by a context-free grammar.

You should be able to parse this efficiently with a LL(1) parser. I would say that the problem is better suited to tokenizing the input using lex and constructing a abstract syntax tree using yacc.

Here's a tutorial on Grammars and parsing with C#

scurial
A: 

Seems like you're trying to describe and use a Context-Free Grammar rather than a regular expression.

Context-free grammars are strictly more powerful than regular expressions:

  • Any language that can be generated using regular expressions can be generated by a context-free grammar.
  • There are languages that can be generated by a context-free grammar that cannot be generated by any regular expression.

For C#, I recommend you ANTLR, is a framework for Language Recognition, allows you to construct recognizers, interpreters, compilers, and translators from grammatical descriptions.

CMS