Well, since I now have an answer to this, I guess I should share it.
Fortran 77, like probably all other languages that care about columns, is a line-oriented language. That means your parser has to keep track of the EOL and actually use it in its parsing.
Another important fact is that in my case, I didn't care about parsing the line numbers that Fortran can put in those early control columns. All I need is to know when it is telling me to scan rest of the line differently.
Given those two things, I could entirely handle this issue with a Spirit skip parser. I wrote mine to
- skip the entire line if the first (comment) column contains an alphabetic charater.
- skip the entire line if there is nothing on it.
- ignore the preceeding EOL and everything up to the fifth column if the fifth column contains a '.' (continuation line). This tacks it to the preceeding line.
- skip all non-eol whitespace (even spaces don't matter in Fortran. Yes, it's a wierd language.)
Here's the code:
skip =
// Full line comment
(spirit::eol >> spirit::ascii::alpha >> *(spirit::ascii::char_ - spirit::eol))
[boost::bind (&fortran::parse_info::skipping_line, &pi)]
|
// remaining line comment
(spirit::ascii::char_ ('!') >> *(spirit::ascii::char_ - spirit::eol)
[boost::bind (&fortran::parse_info::skipping_line_comment, &pi)])
|
// Continuation
(spirit::eol >> spirit::ascii::blank >>
spirit::qi::repeat(4)[spirit::ascii::char_ - spirit::eol] >> ".")
[boost::bind (&fortran::parse_info::skipping_continue, &pi)]
|
// empty line
(spirit::eol >>
-(spirit::ascii::blank >> spirit::qi::repeat(0, 4)[spirit::ascii::char_ - spirit::eol] >>
*(spirit::ascii::blank) ) >>
&(spirit::eol | spirit::eoi))
[boost::bind (&fortran::parse_info::skipping_empty, &pi)]
|
// whitespace (this needs to be the last alternative).
(spirit::ascii::space - spirit::eol)
[boost::bind (&fortran::parse_info::skipping_space, &pi)]
;
I would advise against blindly using this yourself for line-oriented Fortran, as I ignore line numbers, and different compilers have different rules for valid comment and continuation characters.