I am trying to make a recursive-descent parser in Ruby for a grammar, which is defined by the following rules
- Input consists of white-space separated Cards starting with a Stop-word,
where white-space is regex
/[ \n\t]+/
- Card may consist of Keywords or/and Values also separated by white-space, which have card-specific order/pattern
- All Stop-words and Keywords are case-insensitive, i.e.:
/^[a-z]+[a-z0-9]*$/i
Value can be a double-quoted string, which may be not separated from other words by a white-space, e.g.:
word"quoted string"word
Value can be also a word
/^[a-z]+[a-z0-9]*$/
, or integer, or float (e.g.-1.15
, or1.0e+2
)Single-line comment is denoted by
#
and may be not separated from other words, e.g.:word#single-line comment\n
Multi-line comment is denoted by
/*
and*/
and may be not separated from other words, e.g.:word/*multi-line comment*/word
# Input example. Stop-words are chosen just to highlight them: set, object
set title"Input example"set objects 2#not-separated by white-space. test: "/*
set test "#/*"
object 1 shape box/* shape is a Keyword,
box is a Value. test: "#*/object 2 shape sphere
set data # message and complete are Values
0 0 0 0 1 18 18 18 1 35 35 35 72 35 35 # all numbers are Values of the Card "set"
Since most of the words are separated by white-space, for a while I was thinking about splitting the whole input and parsing word-by-word. To deal with comments and quotes, I was going to do
words = input_text.gsub( /([\"\#\n]|\/\*|\*\/)/, ' \1 ' ).split( /[ \t]+/ )
However, in this way the content of strings (and comments, if I want to keep them) is modified. How would you deal with these sticky comments and quotes?