tags:

views:

360

answers:

1

I'm trying to create a simple BaSH-like grammar on ANTLRv3 but haven't been able to parse (and check) input inside subshell commands.

Further explanation:

I want to parse the following input:

$(command parameters*)

`command parameters`

"some text $(command parameters*)"

And be able to check it's contents as I would with simple input such as: command parameters.

i.e.:

Parsing it would generate a tree like

(SUBSHELL (CMD command (PARAM parameters*)))
(tokens are in upper-case)


I'm able to ignore '$('s and '`'s, but that won't cover the cases where the subshells are used inside double-quoted strings, like:

$ echo "String test $(ls -l) end"

So... any tips on how do I achieve this?

+1  A: 

I'm not very familiar with the details of Antlr v3, but I can tell you that you can't handle bash-style command substitution inside double-quoted strings in a traditional-style lexer, as the nesting cannot be expressed using a regular grammar. Most traditional compiler-compilers restrict lexers to use regular grammars so that efficient DFAs can be constructed for them. (Lexers, which irreducibly have to scan every single character of the source, have historically been one of the slowest parts of a compiler.)

You must either parse " as a token and (ideally) use a different lexer or lexer mode for the internals of strings, so that most shell metacharacters, e.g. '{', aren't parsed as tokens but as text; or alternatively, do away with the lexer-parser division and use a scannerless approach, so that the "lexer" rule for double-quoted strings can call into the "parser" rule for command substitutions.

I would favour the scannerless approach. I would investigate how well Antlr v3 supports writing grammars that work directly over a character stream, rather than using a token stream.

Barry Kelly
Thanks for the suggestion! I'm quite new to formal parsing.I think I'll trigger another pass of the parser when this construction in found on a string, as it seems to be easier to implement.
Caio Romão