views:

37

answers:

1

Hello there,
I am using CocoR to generate a java-like scanner/parser:
I'm having some troubles in creating a EBNF expression to match a codeblock:

I'm assuming a code block is surrounded by two well-known tokens: <& and &> example:

public method(int a, int b) <&  
various code  
&>  

If I define a nonterminal symbol

codeblock = "<&" {ANY} "&>"  

If the code inside the two symbols contains a '<' character the generated compiler will not handle it thus giving a syntax error.

Any hint?

Edit:

COMPILER JavaLike
CHARACTERS

nonZeroDigit  = "123456789".
digit         = '0' + nonZeroDigit .
letter        = 'A' .. 'Z' + 'a' .. 'z' + '_' + '$'.

TOKENS
ident = letter { letter | digit }.

PRODUCTIONS
JavaLike = {ClassDeclaration}.
ClassDeclaration ="class" ident ["extends" ident] "{" {VarDeclaration} {MethodDeclaration }"}" .
MethodDeclaration ="public" Type ident "("ParamList")" CodeBlock.
Codeblock = "<&" {ANY} "&>".

I have omitted some productions for the sake of simplicity.
This is my actual implementation of the grammar. The main bug is that it fails if the code in the block contains one of the symbols '>' or '&'.

A: 
Platinum Azure
how would you define ANY_WITHIN_BLOCK?
nick2k3
How are you defining ANY?
Platinum Azure
ANY is the "jolly" token in CocoR: it matches every token.
nick2k3
Sorry, I admit I'm not familiar with CocoR. I'm just writing a pseudocode CFG language. I was assuming you had defined {ANY} but if that's just a catch-all token, then that obviously complicates things. :-) (EDIT: Dang it. lol)
Platinum Azure
I give you more details: I can define a character set in this fashion: letter ="A..Z" + "%>" so a letter may be any capital letter or the symbols within "". then I can define a token, which must be defined in terms of the character set. So a token, let's say, word should be defined as: word = letter{letter}
nick2k3
Ah, and {WHATEVER} means zero or more WHATEVERs?
Platinum Azure
Anyway, I guess my point is, I'd go for a more rigorous specification of EXACTLY what should go within a block, without relying on a catch-all token. Unless you can do a set difference (basically, ANY minus two tokens), you'll only end up having more than you want to in your code blocks.
Platinum Azure
Platinum Azure
Sorry for the late answer: you defined statements: statements statement| /* empty */ but a) how do you define 'statement'? the definition of method-block would be ok but in this way you are kind of parsing the codeblock too. btw I'll post my attributed grammar, maybe it would be more clear.
nick2k3
I guess I left statement out of there because I figured that just comes down to how you'd define it (note that my specification is intentionally partial). "statement" is where I would break it down into things like, method-call, assignment, side-effect-expression, whatever. What I'm trying to show you is, don't rely on "any token" in the middle of a block-- be more specific; you want statements, so what constitutes a statement? If you say "assignment" as one possibility (which you probably should), what does that look like? That one would be `assignment: identifier "=" expression` or similar.
Platinum Azure