tags:

views:

99

answers:

1

I've been hacking around with the May09 Oslo bits, experimented with tokenizing some source code. I can't seem to figure out how to correctly handle multiline C-style comments though. For example: /*comment*/

Some cases that elude me:

/***/

or

/**//**/

I can make one or the other work, but not both. The grammar was:

    module Test {
    language Comments {

        token Comment =
            MultiLineComment;

        token MultiLineComment =
            "/*" MultiLineCommentChar* "*/";

        token MultiLineCommentChar =
            ^ "*" |
            "*" PostAsteriskChar;

        token PostAsteriskChar =
            ^ "*" |
            "*" ^("*" | "/"); 

        /*    
        token PostAsteriskChar =
            ^ "*" |
            "*" PostAsteriskChar; 
        */

        syntax Main = Comment*;
    }
}

The commented out token is what I think I want to do, however recursive tokens are not permitted. The fact that MGrammar itself has "broken" multiline comments (it can't handle /***/) leads me to believe this isn't possible.

Does anyone know otherwise?

A: 

The way I have done it is as follows (not all my own code but I can't find a referance to the original author).

interleave Skippable = Whitespace | Comment;
interleave Comment = CommentToken;
@{Classification["Comment"]}
token CommentToken = CommentDelimited
| CommentLine;
token CommentDelimited = "/*" CommentDelimitedContent* "*/";
token CommentDelimitedContent
= ^('*')
| '*'  ^('/');
token CommentLine = "//" CommentLineContent*;
token CommentLineContent
= ^(
'\u000A' // New Line
|  '\u000D' // Carriage Return
|  '\u0085' // Next Line
|  '\u2028' // Line Separator
|  '\u2029' // Paragraph Separator
);

This allows for both single line // comments as well as multiline /* */ comments.

Cheers Sam

Sam
This fails the first of the two cases that I provided.
Mark