I recent added source file parsing to an existing tool that generated output files from complex command line arguments.
The command line arguments got to be so complex that we started allowing them to be supplied as a file that was parsed as if it was a very large command line, but the syntax was still awkward. So I added the ability to parse a source file using a more reasonable syntax.
I used flex 2.5.4 for windows to generate the tokenizer for this custom source file format, and it worked. But I hated the code. global variables, wierd naming convention, and the c++ code it generated was awful. The existing code generation backend was glued to the output of flex - I don't use yacc or bison.
I'm about to dive back into that code, and I'd like to use a better/more modern tool. Does anyone know of something that.
- Runs in Windows command prompt (Visual studio integration is ok, but I use make files to build)
- Generates a proper encapsulated C++ tokenizer. (No global variables)
- Uses regular expressions for describing the tokenizing rules (compatible with lex syntax a plus)
- Does not force me to use the c-runtime (or fake it) for file reading. (parse from memory)
- Warns me when my rules force the tokenizer to backtrack (or fixes it automatically)
- Gives me full control over variable and method names (so I can conform to my existing naming convention)
- Allows me to link multiple parsers into a single .exe without name collisions
- Can generate a UNICODE (16bit UCS-2) parser if I want it to
- Is NOT an integrated tokenizer + parser-generator (I want a lex replacement, not a lex+yacc replacement)
I could probably live with a tool that just generated the tokenizing tables if that was the only thing available.