I would like to parse a self-designed file format with a FSM-like parser in C++ (this is a teach-myself-c++-the-hard-way-by-doing-something-big-and-difficult
kind of project :)). I have a tokenized string with newlines signifying the end of a euh... line. See here for an input example. All the comments will and junk is filtered out, so I have a std::string like this:
global \n { \n SOURCE_DIRS src \n HEADER_DIRS include \n SOURCES bitwise.c framing.c \n HEADERS ogg/os_types.h ogg/ogg.h \n } \n ...
Syntax explanation:
- { } are scopes, and capitalized words signify that a list of options/files is to follow.
- \n are only important in a list of options/files, signifying the end of the list.
So I thought that a FSM would be simple/extensible enough for my needs/knowledge. As far as I can tell (and want my file design to be), I don't need concurrent states or anything fancy like that. Some design/implementation questions:
- Should I use an
enum
or an abstractclass
+ derivatives for my states? The first is probably better for small syntax, but could get ugly later, and the second is the exact opposite. I'm leaning to the first, for its simplicity.enum
example and class example. EDIT: what about this suggestion forgoto
, I thought they were evil in C++? - When reading a list, I need to NOT ignore
\n
. My preferred way of using thestring
viastringstream
, will ignore\n
by default. So I need simple way of telling (the same!)stringstream
to not ignore newlines when a certain state is enabled. - Will the simple
enum
states suffice for multi-level parsing (scopes within scopes{...{...}...}
) or would that need hacky implementations? - Here's the draft states I have in mind:
upper
: reads global, exe, lib+ target names...normal
: inside a scope, can read SOURCES..., create user variables...list
: adds items to a list until a newline is encountered.
Each scope will have a kind of conditional (e.g. win32:global { gcc:CFLAGS = ... }) and will need to be handled in the exact same fashion eveywhere (even in the list
state, per item).
Thanks for any input.