views:

42

answers:

1

Hi,

I am new at language processing and I want to create a parser with Irony for a following syntax:

name1:value1 name2:value2 name3:value ...

where name1 is the name of an xml element and value is the value of the element which can also include spaces.

I have tried to modify included samples like this:

    public TestGrammar()
    {
        var name = CreateTerm("name");
        var value = new IdentifierTerminal("value");

        var queries = new NonTerminal("queries");
        var query = new NonTerminal("query");
        queries.Rule = MakePlusRule(queries, null, query);
        query.Rule = name + ":" + value;
        Root = queries;
    }

    private IdentifierTerminal CreateTerm(string name)
    {
        IdentifierTerminal term = new IdentifierTerminal(name, "!@#$%^*_'.?-", "!@#$%^*_'.?0123456789");
        term.CharCategories.AddRange(new[]
                                         {
                                             UnicodeCategory.UppercaseLetter, //Ul
                                             UnicodeCategory.LowercaseLetter, //Ll
                                             UnicodeCategory.TitlecaseLetter, //Lt
                                             UnicodeCategory.ModifierLetter, //Lm
                                             UnicodeCategory.OtherLetter, //Lo
                                             UnicodeCategory.LetterNumber, //Nl
                                             UnicodeCategory.DecimalDigitNumber, //Nd
                                             UnicodeCategory.ConnectorPunctuation, //Pc
                                             UnicodeCategory.SpacingCombiningMark, //Mc
                                             UnicodeCategory.NonSpacingMark, //Mn
                                             UnicodeCategory.Format //Cf
                                         });
        //StartCharCategories are the same
        term.StartCharCategories.AddRange(term.CharCategories);
        return term;
    }

but this doesn't work if the values include spaces. Can this be done (using Irony) without modifying the syntax (like adding quotes around values)?

Many thanks!

A: 

If newlines were included between key-value pairs, it would be easily achievable. I have no knowledge of "Irony", but my initial feeling is that almost no parser/lexer generator is going to deal with this given only a naive grammar description. This requires essentially unbounded lookahead.

Conceptually (because I know nothing about this product), here's how I would do it:

Tokenise based on spaces and colons (i.e. every continguous sequence of characters that isn't a space or a colon is an "identifier" token of some sort).

You then need to make it such that every "sentence" is described from colon-to-colon:

sentence = identifier_list
         | : identifier_list identifier : sentence

That's not enough to make it work, but you get the idea at least, I hope. You would need to be very careful to distinguish an identifier_list from a single identifier such that they could be parsed unambiguously. Similarly, if your tool allows you to define precedence and associativity, you might be able to get away with making ":" bind very tightly to the left, such that your grammar is simply:

sentence = identifier : identifier_list

And the behaviour of that needs to be (identifier :) identifier_list.

Gian